1 of 114

Tess Smidt

2018 Alvarez Fellow

in Computing Sciences

O

1s 2s 2s 2p 2p 3d

H

1s 2s 2p

H

1s 2s 2p

Symmetry and Equivariance in Neural Networks

for Scientific Data

Berkeley Lab

Deep Learning School

2020.09.03

2 of 114

Tess Smidt

2018 Alvarez Fellow

in Computing Sciences

Symmetry and Equivariance in Neural Networks

for Scientific Data

Berkeley Lab

Deep Learning School

2020.09.03

Outline

  1. Assumptions in neural networks
  2. Why symmetry appears in scientific problems
  3. Symmetry Invariance vs. Equivariance
  4. How to make symmetry-aware models
    1. Case study: Euclidean Neural Networks
  5. Consequences of making models symmetry-aware
  6. Recap
  7. Resources

3 of 114

3

W

x

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

4 of 114

4

W

x

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Arrays ⇨ Dense NN

Components are independent.

5 of 114

5

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Components are independent.

The same features can be found anywhere in an image. Locality.

W

x

6 of 114

6

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Text ⇨ Recurrent NN

Components are independent.

The same features can be found anywhere in an image. Locality.

Sequential data. Next input/output depends on input/output that has come before.

W

x

7 of 114

7

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Text ⇨ Recurrent NN

Components are independent.

The same features can be found anywhere in an image. Locality.

Sequential data. Next input/output depends on input/output that has come before.

W

x

Graph ⇨ Graph (Conv.) NN

Topological data. Nodes have features and network passes messages between nodes connected via edges.

8 of 114

8

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Text ⇨ Recurrent NN

Components are independent.

The same features can be found anywhere in an image. Locality.

Sequential data. Next input/output depends on input/output that has come before.

W

x

Graph ⇨ Graph (Conv.) NN

3D physical data

⇨ Euclidean NN

Data in 3D Euclidean space. Freedom to choose coordinate system.

Topological data. Nodes have features and network passes messages between nodes connected via edges.

9 of 114

9

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Symmetries emerge from these assumptions.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Text ⇨ Recurrent NN

Components are independent.

The same features can be found anywhere in an image. Locality.

Sequential data. Next input/output depends on input/output that has come before.

W

x

Graph ⇨ Graph (Conv.) NN

3D physical data

⇨ Euclidean NN

Data in 3D Euclidean space. Equivariant to choice of coordinate system.

No symmetry!

2D-translation symmetry

(forward) time-translation symm.

permutation symmetry

3D Euclidean symmetry E(3): 3D rotations translations and inversion

Topological data. Nodes have features and network passes messages between nodes connected via edges.

10 of 114

10

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

11 of 114

11

Translation symmetry in 2D:

Features “mean” the same thing in any location.

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

12 of 114

12

Translation symmetry in 2D:

Features “mean” the same thing in any location.

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

Symmetry of 2D objects

Boundaries “break” global translation symmetry.

Periodic boundary conditions preserve

discrete translation symmetry.

13 of 114

13

Permutation symmetry, SN:

Symmetry of sets

The freedom to list things in any order

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

14 of 114

14

Permutation symmetry, SN:

Symmetry of sets

The freedom to list things in any order

Symmetry of elements of a graph

Graph automorphism, specific nodes are indistinguishable (same global connectivity)

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

15 of 114

15

Euclidean symmetry, E(3):

Symmetry of 3D space

The freedom to choose your coordinate system

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

16 of 114

16

Euclidean symmetry, E(3):

Symmetry of 3D space

The freedom to choose your coordinate system

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

Symmetry of geometric objects

Looks the same under specific rotations, translations, and inversion (includes mirrors).

17 of 114

17

Euclidean symmetry, E(3):

Symmetry of 3D space

The freedom to choose your coordinate system

Symmetry of geometric objects

Looks the same under specific rotations, translations, and inversion (includes mirrors).

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

18 of 114

18

Three ways to make models “symmetry-aware”

e.g. How to make a model that “understands” the symmetry of atomic structures?

H -0.21463 0.97837 0.33136

C -0.38325 0.66317 -0.70334

C -1.57552 0.03829 -1.05450

H -2.34514 -0.13834 -0.29630

C -1.78983 -0.36233 -2.36935

H -2.72799 -0.85413 -2.64566

C -0.81200 -0.13809 -3.33310

H -0.98066 -0.45335 -4.36774

C 0.38026 0.48673 -2.98192

H 1.14976 0.66307 -3.74025

C 0.59460 0.88737 -1.66708

H 1.53276 1.37906 -1.39070

Coordinates are most general, but sensitive to translations, rotations, and inversion.

19 of 114

19

Approach 1:

It doesn’t matter! It’s deep learning! Throw all your data at the problem and see what you get!

H -0.21463 0.97837 0.33136

C -0.38325 0.66317 -0.70334

C -1.57552 0.03829 -1.05450

H -2.34514 -0.13834 -0.29630

C -1.78983 -0.36233 -2.36935

H -2.72799 -0.85413 -2.64566

C -0.81200 -0.13809 -3.33310

H -0.98066 -0.45335 -4.36774

C 0.38026 0.48673 -2.98192

H 1.14976 0.66307 -3.74025

C 0.59460 0.88737 -1.66708

H 1.53276 1.37906 -1.39070

Three ways to make models “symmetry-aware”

e.g. How to make a model that “understands” the symmetry of atomic structures?

Coordinates are most general, but sensitive to translations, rotations, and inversion.

20 of 114

20

Approach 1:

It doesn’t matter! It’s deep learning! Throw all your data at the problem and see what you get!

Approach 2:

Convert your data to invariant representations so the neural network can’t possibly mess it up.

H -0.21463 0.97837 0.33136

C -0.38325 0.66317 -0.70334

C -1.57552 0.03829 -1.05450

H -2.34514 -0.13834 -0.29630

C -1.78983 -0.36233 -2.36935

H -2.72799 -0.85413 -2.64566

C -0.81200 -0.13809 -3.33310

H -0.98066 -0.45335 -4.36774

C 0.38026 0.48673 -2.98192

H 1.14976 0.66307 -3.74025

C 0.59460 0.88737 -1.66708

H 1.53276 1.37906 -1.39070

Three ways to make models “symmetry-aware”

e.g. How to make a model that “understands” the symmetry of atomic structures?

Coordinates are most general, but sensitive to translations, rotations, and inversion.

21 of 114

21

Approach 1:

It doesn’t matter! It’s deep learning! Throw all your data at the problem and see what you get!

Approach 3:

If there’s no model that naturally handles coordinates,

we will make one.

Approach 2:

Convert your data to invariant representations so the neural network can’t possibly mess it up.

H -0.21463 0.97837 0.33136

C -0.38325 0.66317 -0.70334

C -1.57552 0.03829 -1.05450

H -2.34514 -0.13834 -0.29630

C -1.78983 -0.36233 -2.36935

H -2.72799 -0.85413 -2.64566

C -0.81200 -0.13809 -3.33310

H -0.98066 -0.45335 -4.36774

C 0.38026 0.48673 -2.98192

H 1.14976 0.66307 -3.74025

C 0.59460 0.88737 -1.66708

H 1.53276 1.37906 -1.39070

Three ways to make models “symmetry-aware”

e.g. How to make a model that “understands” the symmetry of atomic structures?

Coordinates are most general, but sensitive to translations, rotations, and inversion.

22 of 114

22

Approach 1:

It doesn’t matter! It’s deep learning! Throw all your data at the problem and see what you get!

Training

Data Augmentation

Loss Function Constraints

Approach 3:

If there’s no model that naturally handles coordinates,

we will make one.

Model

Invariant models

Equivariant models

Approach 2:

Convert your data to invariant representations so the neural network can’t possibly mess it up.

Inputs

Invariant Representations

👍

😭

😍

H -0.21463 0.97837 0.33136

C -0.38325 0.66317 -0.70334

C -1.57552 0.03829 -1.05450

H -2.34514 -0.13834 -0.29630

C -1.78983 -0.36233 -2.36935

H -2.72799 -0.85413 -2.64566

C -0.81200 -0.13809 -3.33310

H -0.98066 -0.45335 -4.36774

C 0.38026 0.48673 -2.98192

H 1.14976 0.66307 -3.74025

C 0.59460 0.88737 -1.66708

H 1.53276 1.37906 -1.39070

Three ways to make models “symmetry-aware”

e.g. How to make a model that “understands” the symmetry of atomic structures?

Coordinates are most general, but sensitive to translations, rotations, and inversion.

23 of 114

23

Data augmentation is the brute-force approach to teach a model how to “emulate” symmetry-awareness. For 3D data, data augmentation is expensive ~500 fold augmentation.

training without symmetry

training with symmetry

24 of 114

24

Data augmentation is the brute-force approach to teach a model how to “emulate” symmetry-awareness. For 3D data, data augmentation is expensive ~500 fold augmentation.

training without symmetry

training with symmetry

Data augmentation and adding loss function terms make sense when it’s difficult to formalize

  • the group wrt you are equivariant
  • or the quantity you want to conserve.

25 of 114

Neural networks can’t mess up invariant representations.

You can use ANY neural network with an invariant representation.

Invariant representations can be used for other machine learning algorithms

(e.g. kernel methods).

26 of 114

If you can craft a good invariant representation -- great!

But deep learning’s specialty is feature learning.

So, maybe use a different machine learning approach (e.g. kernel methods).

Neural networks can’t mess up invariant representations.

You can use ANY neural network with an invariant representation.

Invariant representations can be used for other machine learning algorithms

(e.g. kernel methods).

27 of 114

27

Invariance vs. Equivariance

Does NOT change ⇨ Invariant

Changes deterministically Equivariant

Properties of a vector under E(3)

Translation

Rotation

Inversion

3D vector

28 of 114

Invariant featurizations can be very expressive if well-crafted

Many invariant featurizations use equivariant operations

e.g. a (simplified) SOAP kernel for ethane molecule C2H6

  1. Project neighbors of given atom onto spherical harmonics (equivariant quantity).

  • Interact signals from different atoms via tensor dot product (equivariant operation) to produce scalars (invariant quantity).

  • Give scalars to model.

(1)

(2)

(3)

29 of 114

Models: Invariant and Equivariant

30 of 114

30

For a function to be equivariant means that we can act on our inputs with g

OR act our outputs with g and we get the same answer (for every operation).

For a function to be invariant means g is the identity (no change).

Layer

in

out

g

Layer

in

out

g

=

31 of 114

31

Why limit yourself to equivariant functions?

You can substantially shrink the space of functions you need to optimize over.

This means you need less data to constrain your function.

All learnable functions

All learnable equivariant functions

All learnable functions constrained by your data.

Functions you actually wanted to learn.

32 of 114

32

Why not limit yourself to invariant functions?

You have to guarantee that your input features already

contain any necessary equivariant interactions (e.g. cross-products).

All learnable equivariant functions

Functions you actually wanted to learn.

All learnable invariant functions.

All invariant functions constrained by your data.

OR

33 of 114

Convolutions: Local vs. Global Symmetry

Convolutions capture local symmetry. Interaction of features in later layers yields global symmetry.

e.g. Coordination environments in crystals

Atomic systems form geometric motifs that can appear at multiple locations and orientations.

(Local symmetry)

Space group:

Symmetry of unit cell

(Global symmetry)

34 of 114

How Euclidean Neural Networks achieve equivariance to Euclidean symmetry

(high level)

35 of 114

35

We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).

We use points. Images of atomic systems are sparse and imprecise.

vs.

Other atoms

Convolution center

We use continuous convolutions with atoms as convolution centers.

Euclidean Neural Networks are similar to convolutional neural networks...

36 of 114

36

We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).

We use points. Images of atomic systems are sparse and imprecise.

vs.

Other atoms

Convolution center

We use continuous convolutions with atoms as convolution centers.

Euclidean Neural Networks are similar to convolutional neural networks...

37 of 114

37

We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).

Other atoms

Convolution center

We use continuous convolutions with atoms as convolution centers.

We use points. Images of atomic systems are sparse and imprecise.

vs.

Euclidean Neural Networks are similar to convolutional neural networks...

38 of 114

38

Translation equivariance

Rotation equivariance

39 of 114

39

Translation equivariance

Convolutional neural network

Rotation equivariance?

40 of 114

40

Translation equivariance

Convolutional neural network

Rotation equivariance

Data augmentation

Radial functions (invariant)

Want a network that both preserves geometry and exploits symmetry.

41 of 114

Convolutional filters based on learned radial functions and spherical harmonics.

=

Euclidean Neural Networks are similar to convolutional neural networks,

EXCEPT with special filters and tensor algebra!

42 of 114

Everything in the network is a geometric tensor!

Scalar multiplication gets replaced with the more general tensor product.

Contract two indices to one with Clebsch-Gordan Coefficients.

Dot product

Cross

product

Outer product

Example: How do you “multiply” two vectors?

Scalar,

Rank-0

Vector, Rank-1

Matrix,

Rank-2

Euclidean Neural Networks are similar to convolutional neural networks,

EXCEPT with special filters and tensor algebra!

43 of 114

43

Given a molecule and a rotated copy,

predicted forces are the same up to rotation.

(Predicted forces are equivariant to rotation.)

Additionally, networks generalize to molecules with similar motifs.

44 of 114

44

Primitive unit cells, conventional unit cells, and supercells of the same crystal produce the same output (assuming periodic boundary conditions).

45 of 114

45

O

1s 2s 2s 2p 2p 3d

H

1s 2s 2p

H

1s 2s 2p

Networks can predict molecular Hamiltonians in any orientation

from seeing a single example.

46 of 114

46

Equivariance can have unintuitive consequences.

47 of 114

47

Equivariance can have unintuitive consequences.

Partition graph with permutation equivariant function into two sets using ordered labels.

Predict node labels

[0, 1] vs. [1, 0]

48 of 114

48

Equivariance can have unintuitive consequences.

Partition graph with permutation equivariant function into two sets using ordered labels.

You can’t due to degeneracy.

[0, 1]

[1, 0]

[0, 1]

[1, 0]

There’s nothing to distinguish one partition to be “first” vs. “second”.

Predict node labels

[0, 1] vs. [1, 0]

49 of 114

49

Equivariance can have unintuitive consequences.

The input, intermediate, and output data

of Euclidean Neural Networks must be geometric tensors.

Vector

Pseudo-

vector

Double-

Headed

Ray

Spiral

Rotation 丄

Reflection ‖

Inversion

3x3 matrix expressed as linear combination of spherical harmonics

50 of 114

50

Equivariance can have unintuitive consequences.

Euclidean neural networks produce outputs with equal or higher symmetry than inputs.

input

random model 1

random model 2

random model 3

Tetrahedron

Octahedron

51 of 114

51

Equivariance can have unintuitive consequences.

Equivariant neural networks can be used as “symmetry compilers”

and can be used to find “missing symmetry-implied” data.

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

52 of 114

52

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

Symmetrically degenerate rectangle!

Equivariance can have unintuitive consequences.

Equivariant neural networks can be used as “symmetry compilers”

and can be used to find “missing symmetry-implied” data.

D2h → D4h

D4h → D2h

53 of 114

53

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

Network predicts degenerate outcomes!

D2h → D4h

D4h → D2h

Equivariance can have unintuitive consequences.

Equivariant neural networks can be used as “symmetry compilers”

and can be used to find “missing symmetry-implied” data.

54 of 114

54

D2h → D4h

D2h → D4h

Learns anisotropic inputs. Model can fit.

Input

Output

L = 0 + 2 + 4

L = 0

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

Use gradients to “find” what’s missing.

Equivariance can have unintuitive consequences.

Equivariant neural networks can be used as “symmetry compilers”

and can be used to find “missing symmetry-implied” data.

D4h → D2h

55 of 114

55

A Quick Recap!

Symmetry emerges from assumptions about what examples “mean” the same thing.

2D-translation symmetry: Different locations of the same pattern mean the same thing.

Euclidean symmetry: Different coordinate systems represent the same physical system

Permutation symmetry: Differently order arrays represent the same set

Three approaches to make models “symmetry-aware”

😭 Training: Data augmentations

👍 Inputs: Invariant Representations

😍 Model: Invariant / Equivariant Operations

Invariant does NOT change. Equivariant changes deterministically.

Equivariant models substantially shrink the space of functions you need to optimize over; you need less data to constrain your function.

Equivariant models are generally more expressive that invariant models.

Equivariance can have unintuitive consequences.

Sometimes these consequences lead you to new approaches.

56 of 114

developers and collaborators of e3nn

(and atomic architects)

Mario Geiger

Ben Miller

Tess Smidt

Koctiantyn Lapchevskyi

Boris

Kozinsky

Simon

Batzner

Josh Rackers

Thomas

Hardin

Tahnee

Gehm

57 of 114

tensor field networks

Google Accelerated Science Team

Stanford

Patrick Riley

Steve Kearnes

Nate Thomas

Lusann Yang

Kai Kohlhoff

Li

Li

58 of 114

58

Feel free to reach out if you have any questions!

Resources on Euclidean neural networks:

e3nn Code (PyTorch):

http://github.com/e3nn/e3nn

e3nn_tutorial:

http://blondegeek.github.io/e3nn_tutorial/

Papers:

Tensor Field Networks (arXiv:1802.08219)

3D Steerable CNNs (arXiv:1807.02547)

Clebsch-Gordon Nets (arXiv:1806.09231)

Cormorant (arXiv:1906.04015)

SE(3)-Transformers (arXiv:2006.10503)

e3nn for symm breaking (arXiv:2007.02005)

e3nn on QM9 (arXiv:2008.08461)

My past talks (look for video / slide links):

https://blondegeek.github.io/talks

Resources

Pytorch geometric (graphs / meshes / points)

http://github.com/rusty1s/pytorch_geometric

Papers:

SOAP Kernels (arXiv:1209.3140)

Harmonic networks (arXiv:1612.04642)

Neural Message Passing (arXiv:1704.01212)

SchNet (arXiv:1706.08566)

Deep Sets (arXiv:1703.06114)

Gauge equivariant NN (arXiv:1902.04615)

Lorentz equivariant NN (arXiv:2006.04780)

Natural Graph Networks (arXiv:2007.08349)

Books on group theory (mostly E(3) but some general)

  • Group Theory: Application to the Physics of Condensed Matter by Dresselhaus(x2) and Jorio
  • Applied group theory, Cracknell
  • Math. Theory of Symm. in Solids, Bradley and Cracknell

Workshop Tomorrow (9/4)!

...on Equivariance and Data Augmentation

online, hosted by the University of Pennsylvania

https://sites.google.com/view/equiv-data-aug/home

Tess Smidt

tsmidt@lbl.gov

59 of 114

59

Calling in backup (slides)!

60 of 114

60

Let g be a 3d rotation matrix.

a-1

+a0

+a1

=

D is the Wigner-D matrix.

It has shape

and is a function of g.

Spherical harmonics of a given L transform together under rotation.

g

b-1

+b0

+b1

D

61 of 114

61

Analogous to... the laws of (non-relativistic) physics have Euclidean symmetry,

even if systems do not.

The network is our model of physics. The input to the network is our system.

q

B

q

q

q

q

62 of 114

62

A Euclidean symmetry preserving network produces outputs that preserve

the subset of symmetries induced by the input.

O(3)

Oh

Pm-3m

(221)

SO(2) + mirrors

(C∞v)

3D rotations and inversions

2D rotation and mirrors along cone axis

Discrete rotations and mirrors

Discrete rotations, mirrors, and translations

63 of 114

63

Properties of a system must be compatible with symmetry.

Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?

m

m

m

m

m

m

a.

b.

c.

64 of 114

64

m

m

m

m

m

m

a.

b.

c.

Properties of a system must be compatible with symmetry.

Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?

65 of 114

65

m

m

m

m

m

m

a.

b.

c.

m

2m

Properties of a system must be compatible with symmetry.

Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?

66 of 114

66

m

m

m

m

m

m

a.

b.

c.

m

2m

m

m

g

Properties of a system must be compatible with symmetry.

Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?

67 of 114

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

68 of 114

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

69 of 114

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

equivariant to x if

70 of 114

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

If we want to be equivariant to x, this has to be the case…

weights must be “scalars”

equivariant to x if

71 of 114

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

If we want to be equivariant to x, this has to be the case…

weights must be “scalars”

equivariant to x if

72 of 114

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

If we want to be equivariant to x, this has to be the case…

weights must be “scalars”

equivariant to x if

(special case) invariant to x if

73 of 114

73

M. Zaheer et al, Deep Sets, NeurIPS 2017

74 of 114

74

Convolutional neural networks can “cheat” by being sensitive to “boundaries”.

(e.g. Predict geodesics on projected maps with and without periodic boundary conditions)

User: Stebe

https://en.wikipedia.org/wiki/Gall-Peters_projection

Nodes can be distinguished due to differing topology by latitude (e.g. poles)!

Boundaries break symmetry.

Pixels cannot be distinguished due to translation equivariance.

75 of 114

75

In the physical sciences...

What our our data types?

3D geometry and geometric tensors...

...which transform predictably under 3D rotation, translation, and inversion.

These data types assume Euclidean symmetry.

⇨ Thus, we need neural networks that preserve Euclidean symmetry.

76 of 114

76

Scalars

  • Energy
  • Mass
  • Isotropic *

Vectors

  • Force
  • Velocity
  • Acceleration
  • Polarization

Pseudovectors

  • Angular momentum
  • Magnetic fields

Matrices, Tensors, …

  • Moment of Inertia
  • Polarizability
  • Interaction of multipoles
  • Elasticity tensor (rank 4)

m

Atomic orbitals

Output of Angular Fourier Transforms

Vector fields on spheres

(e.g. B-modes of the Cosmic Microwave Background)

Geometric tensors take many forms. They are a general data type beyond materials.

77 of 114

77

The input to our network is geometry and features on that geometry.

78 of 114

78

The input to our network is geometry and features on that geometry.

We categorize our features by how they transform under rotation.

Features have “angular frequency” L

where L is a positive integer.

Scalars

Vectors

3x3 Matrices

Frequency

Doesn’t change with rotation

Changes with same frequency as rotation

79 of 114

79

Our unit test: Trained on 3D Tetris shapes in one orientation,

these network can perfectly identify these shapes in any orientation.

TRAIN

TEST

Chiral

80 of 114

80

Applications so far...

  • Finding order parameters of 2nd order structural phase transitions
  • Molecular dynamics (Harvard)
  • Molecule and crystal property prediction (FU Berlin)
  • Inverting invariant representations of atomic geometries (Sandia)
  • Autoencoding Geometry
  • Predicting molecular Hamiltonians (TU Berlin)
  • Long range interactions (FU Berlin, TU Berlin)
  • Electron density prediction for large molecules (Sandia)
  • Predicting chemical shifts for NMR (Merck, MIT)
  • Conditional protein design (UW)
  • Inverse design of optical properties of nanoparticle assemblies (LBL and UW)
  • Phonon properties of crystal structures (MIT)
  • Anharmonic elastic properties of crystal structures (UTEP)
  • ...

81 of 114

81

Predict ab initio forces for molecular dynamics

Preliminary results originally presented at

APS March Meeting 2019.

Paper in progress.

Testing on liquid water, Euclidean neural networks (Tensor-Field Molecular Dynamics) require less data to train than traditional networks to get state of the art results.

Data set from: [1]

Zhang, L. et al. E. (2018).

PRL120(14), 143001.

Boris

Kozinsky

Simon

Batzner

82 of 114

Euclidean neural networks can manipulate geometry,

which means they can be used for generative models such as autoencoders.

83 of 114

geometry

features

To encode/decode, we have to be able to convert geometry into features and vice versa.

We do this via spherical harmonic projections.

Euclidean neural networks can manipulate geometry,

which means they can be used for generative models such as autoencoders.

84 of 114

84

Equivariant neural networks can learn to invert invariant representations.

Which can be used to recover geometry.

Network can predict spherical harmonic projection...

Invariant features + coordinate frame

ENN

Peak finding

Josh Rackers

Thomas

Hardin

85 of 114

Pooling

Pooling

Unpooling

Unpooling

We can also build an autoencoder for geometry: e.g. Autoencoder on 3D Tetris

Centers deleted

Centers deleted

86 of 114

Pooling

Pooling

Unpooling

Unpooling

We can also build an autoencoder for geometry: e.g. Autoencoder on 3D Tetris

87 of 114

87

Several groups converged on similar ideas around the same time.

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

(arXiv:1802.08219)

Tess Smidt*, Nathaniel Thomas*, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, Patrick Riley

Points, nonlinearity on norm of tensors

Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network

(arXiv:1806.09231)

Risi Kondor, Zhen Lin, Shubhendu Trivedi

Only use tensor product as nonlinearity, no radial function

3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data

(arXiv:1807.02547)

Mario Geiger*, Maurice Weiler*, Max Welling, Wouter Boomsma, Taco Cohen

Efficient framework for voxels, gated nonlinearity

*denotes equal contribution

88 of 114

88

Several groups converged on similar ideas around the same time.

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

(arXiv:1802.08219)

Tess Smidt*, Nathaniel Thomas*, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, Patrick Riley

Points, nonlinearity on norm of tensors

Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network

(arXiv:1806.09231)

Risi Kondor, Zhen Lin, Shubhendu Trivedi

Only use tensor product as nonlinearity, no radial function

3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data

(arXiv:1807.02547)

Mario Geiger*, Maurice Weiler*, Max Welling, Wouter Boomsma, Taco Cohen

Efficient framework for voxels, gated nonlinearity

*denotes equal contribution

Tensor field networks + 3D steerable CNNs

= Euclidean neural networks (e3nn)

89 of 114

89

Let g be a 3d rotation matrix.

a-1

+a0

+a1

=

D is the Wigner-D matrix.

It has shape

and is a function of g.

Spherical harmonics of a given L transform together under rotation.

g

b-1

+b0

+b1

D

90 of 114

Convolve

Bloom

Make points to cluster

Symmetric Cluster

Cluster bloomed points

Combine

Convolve with point origins of cluster members

Geometry

New Geometry

How to encode (Pooling layer). Recursively convert geometry to features.

91 of 114

1st

2nd

Convolve

Bloom

Make new points

Cluster

Merge duplicate points

Combine

Convolve with origin point

of new points

Geometry

New Geometry

How to decode (Unpooling layer). Recursively convert features to geometry.

92 of 114

92

Discrete geometry

Discrete geometry

Reduce geometry to single point.

Create geometry from single point.

We want to convert geometric information (3D coordinates of atomic positions)

into features on a trivial geometry (a single point)

and back again.

Single point with continuous

latent representation

(N dimensional vector)

93 of 114

93

Reduce geometry to single point.

Create geometry from single point.

Atomic structures are hierarchical and can be constructed from recurring geometric motifs.

We want to convert geometric information (3D coordinates of atomic positions)

into features on a trivial geometry (a single point)

and back again.

Discrete geometry

Discrete geometry

Single point with continuous

latent representation

(N dimensional vector)

94 of 114

94

Reduce geometry to single point.

Create geometry from single point.

  • Encode geometry
  • Encode hierarchy

(Need to do this in a recursive manner)

We want to convert geometric information (3D coordinates of atomic positions)

into features on a trivial geometry (a single point)

and back again.

Discrete geometry

Discrete geometry

Single point with continuous

latent representation

(N dimensional vector)

Atomic structures are hierarchical and can be constructed from recurring geometric motifs.

  • Decode geometry
  • Decode hierarchy

95 of 114

To autoencode, we have to be able to convert geometry into features and vice versa.

We do this via spherical harmonic projections.

96 of 114

96

...where the electrons are...

Given an atomic structure,

Energy (eV)

Momentum

...and what the electrons are doing.

...use quantum theory and supercomputers to determine...

What a computational materials physicist does:

Structure

Properties

Si

97 of 114

Quantum Theory / Molecular dynamics

+ Supercomputers

Properties

Hypothesize

Inverse Design

Zooooom!

Map

Structure

We want to use deep learning to speed up calculations, hypothesize new structures, perform inverse design, and organize these relations.

98 of 114

Quantum Theory / Molecular dynamics

+ Supercomputers

Properties

Hypothesize

Inverse Design

Zooooom!

Map

Structure

We want to use deep learning to speed up calculations, hypothesize new structures, perform inverse design, and organize these relations.

The problems start here

99 of 114

99

Given a single example of a degenerate solution,

it knows what other solutions are possible by symmetry.

(Useful for ensuring you’re not biasing your sampling.)

100 of 114

100

To be rotation-equivariant means that we can rotate our inputs

OR rotate our outputs and we get the same answer (for every operation).

Layer

in

out

Rot

Layer

in

out

Rot

=

101 of 114

101

For L=1 ⇨ L=1, the filters will be a learned, radially-dependent linear combinations of the L = 0, 1, and 2 spherical harmonics.

L=2

Random filters for

L=1 ⇨ L=1…

(3 in L=1 channels by

3 out L=1 channels)

… as a function of increasing r.

Time showing filter for varying r, where

0 ≤ r ≤ rmax.

(+ / )

Radial distance is magnitude

as a function of angle

102 of 114

102

103 of 114

103

Predictions for Oh symmetry

Ground Truth

Prediction of network trained with symmetry breaking input and given symmetry breaking input along z.

Prediction of network trained with symmetry breaking input but given trivial input

(single scalar).

Superposition of 6 rotationally degenerate solutions.

104 of 114

104

A brief primer on deep learning

deep learning ⊂ machine learning ⊂ artificial intelligence

model | deep learning | data | cost function | way to update parameters | conv. nets

105 of 114

105

model (“neural network”):

Function with learnable parameters.

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

106 of 114

106

model (“neural network”):

Function with learnable parameters.

Linear transformation

Element-wise nonlinear function

Learned

Parameters

Ex: "Fully-connected" network

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

107 of 114

107

model (“neural network”):

Function with learnable parameters.

Neural networks with multiple layers can learn more complicated functions.

Learned

Parameters

model | deep learning | data | cost function | way to update parameters | conv. nets

Ex: "Fully-connected" network

A brief primer on deep learning

108 of 114

108

model (“neural network”):

Function with learnable parameters.

Neural networks with multiple layers can learn more complicated functions.

Learned

Parameters

model | deep learning | data | cost function | way to update parameters | conv. nets

Ex: "Fully-connected" network

A brief primer on deep learning

109 of 114

109

deep learning:

Add more layers.

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

110 of 114

110

data:

Want lots of it. Model has many parameters. Don't want to easily overfit.

https://en.wikipedia.org/wiki/Overfitting

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

111 of 114

111

cost function:

A metric to assess how well the model is performing.

The cost function is evaluated on the output of the model.

Also called the loss or error.

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

112 of 114

112

way to update parameters:

Construct a model that is differentiable

Easiest to do with differentiable programming frameworks: e.g. Torch, TensorFlow, JAX, ...

Take derivatives of the cost function (loss or error) wrt to learnable parameters.

This is called backpropogation (aka the chain rule).

error

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

113 of 114

113

http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

model | deep learning | data | cost function | way to update parameters | conv. nets

convolutional neural networks:

Used for images. In each layer, scan over image with learned filters.

A brief primer on deep learning

114 of 114

114

model | deep learning | data | cost function | way to update parameters | conv. nets

http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/

convolutional neural networks:

Used for images. In each layer, scan over image with learned filters.

A brief primer on deep learning

Back