1 of 125

Tess Smidt

2018 Alvarez Fellow

in Computing Sciences

Neural networks with Euclidean Symmetry for the Physical Sciences

Physics ∩ ML

2020.11.18

SLIDES: https://tinyurl.com/e3nn-physics-meets-ml

2 of 125

Tess Smidt

2018 Alvarez Fellow

in Computing Sciences

All I wanted was 3D rotation equivariance and I got...

....geometric tensors, space groups, point groups, selection rules, normal modes, degeneracy, 2nd order phase transitions, and a much better understanding of physics.

Neural networks with Euclidean Symmetry for the Physical Sciences

Physics ∩ ML

2020.11.18

SLIDES: https://tinyurl.com/e3nn-physics-meets-ml

3 of 125

3

The laws of physics have rotational, translational,

and (unless you’re a particle physicist) parity symmetry.

We want machine learning models that also obey this symmetry.

e.g. a network is our model of physics. The input to the network is our system.

q

B

q

q

q

q

4 of 125

4

Symmetry emerges when different ways of representing something “mean” the same thing.

Symmetry of representation vs. objects

5 of 125

5

Symmetry emerges when different ways of representing something “mean” the same thing.

Symmetry of representation vs. objects

Euclidean symmetry, E(3):

Symmetry of 3D space

The freedom to choose your coordinate system

6 of 125

3D Translation

3D Rotation

3D Inversion

Mirrors

We transform between coordinate systems with...

Symmetry emerges when different ways of representing something “mean” the same thing.

Symmetry of representation vs. objects

Euclidean symmetry, E(3):

Symmetry of 3D space

The freedom to choose your coordinate system

7 of 125

7

Symmetry of geometric objects

Looks the same under specific rotations, translations, and inversion (includes mirrors).

Symmetry emerges when different ways of representing something “mean” the same thing.

Symmetry of representation vs. objects

Euclidean symmetry, E(3):

Symmetry of 3D space

The freedom to choose your coordinate system

8 of 125

8

Symmetry of geometric objects

Looks the same under specific rotations, translations, and inversion (includes mirrors).

Symmetry emerges when different ways of representing something “mean” the same thing.

Symmetry of representation vs. objects

Euclidean symmetry, E(3):

Symmetry of 3D space

The freedom to choose your coordinate system

9 of 125

9

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Text ⇨ Recurrent NN

Components are independent.

The same features can be found anywhere in an image. Locality.

Sequential data. Next input/output depends on input/output that has come before.

W

x

Graph ⇨ Graph (Conv.) NN

3D physical data

⇨ Euclidean NN

Data in 3D Euclidean space. Freedom to choose coordinate system.

Topological data. Nodes have features and network passes messages between nodes connected via edges.

10 of 125

10

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Thus, symmetries are encoded by tailoring network operations.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Text ⇨ Recurrent NN

Components are independent.

The same features can be found anywhere in an image. Locality.

Sequential data. Next input/output depends on input/output that has come before.

W

x

Graph ⇨ Graph (Conv.) NN

3D physical data

⇨ Euclidean NN

Data in 3D Euclidean space. Equivariant to choice of coordinate system.

No symmetry!

2D-translation symmetry

(forward) time-translation symm.

permutation symmetry

3D Euclidean symmetry E(3): 3D rotations, translations, and inversion

Topological data. Nodes have features and network passes messages between nodes connected via edges.

11 of 125

11

H -0.21463 0.97837 0.33136

C -0.38325 0.66317 -0.70334

C -1.57552 0.03829 -1.05450

H -2.34514 -0.13834 -0.29630

C -1.78983 -0.36233 -2.36935

H -2.72799 -0.85413 -2.64566

C -0.81200 -0.13809 -3.33310

H -0.98066 -0.45335 -4.36774

C 0.38026 0.48673 -2.98192

H 1.14976 0.66307 -3.74025

C 0.59460 0.88737 -1.66708

H 1.53276 1.37906 -1.39070

Coordinates are most general, but sensitive to translations, rotations, and inversion.

Three ways to make models “symmetry-aware” for 3D data

e.g. How to make a model that “understands” the symmetry of atomic structures?

12 of 125

12

Approach 1:

Data Augmentation

Throw data at the problem and see what you get!

Approach 3:

Invariant models

Equivariant models

If there’s no model that naturally handles coordinates,

we will make one.

Approach 2:

Invariant Inputs

Convert your data to invariant representations so the neural network can’t possibly mess it up.

H -0.21463 0.97837 0.33136

C -0.38325 0.66317 -0.70334

C -1.57552 0.03829 -1.05450

H -2.34514 -0.13834 -0.29630

C -1.78983 -0.36233 -2.36935

H -2.72799 -0.85413 -2.64566

C -0.81200 -0.13809 -3.33310

H -0.98066 -0.45335 -4.36774

C 0.38026 0.48673 -2.98192

H 1.14976 0.66307 -3.74025

C 0.59460 0.88737 -1.66708

H 1.53276 1.37906 -1.39070

Coordinates are most general, but sensitive to translations, rotations, and inversion.

Three ways to make models “symmetry-aware” for 3D data

e.g. How to make a model that “understands” the symmetry of atomic structures?

13 of 125

13

Approach 1:

Data Augmentation

Throw data at the problem and see what you get!

Approach 3:

Invariant models

Equivariant models

If there’s no model that naturally handles coordinates,

we will make one.

Approach 2:

Invariant Inputs

Convert your data to invariant representations so the neural network can’t possibly mess it up.

👌

😭

😍

H -0.21463 0.97837 0.33136

C -0.38325 0.66317 -0.70334

C -1.57552 0.03829 -1.05450

H -2.34514 -0.13834 -0.29630

C -1.78983 -0.36233 -2.36935

H -2.72799 -0.85413 -2.64566

C -0.81200 -0.13809 -3.33310

H -0.98066 -0.45335 -4.36774

C 0.38026 0.48673 -2.98192

H 1.14976 0.66307 -3.74025

C 0.59460 0.88737 -1.66708

H 1.53276 1.37906 -1.39070

Coordinates are most general, but sensitive to translations, rotations, and inversion.

Three ways to make models “symmetry-aware” for 3D data

e.g. How to make a model that “understands” the symmetry of atomic structures?

14 of 125

14

Invariance vs. Equivariance (covariance) e.g. in 3D space

Does NOT change ⇨ Invariant

Changes deterministically Equivariant

Properties of a vector under E(3)

Translation

Rotation

Inversion

3D vector

15 of 125

15

For 3D data, data augmentation is expensive, ~500 fold augmentation

and you still don’t get the guarantee of equivariance (it’s only emulated).

training without rotational symmetry

training with symmetry

16 of 125

16

For a function to be equivariant means that we can act on our inputs with g

OR act our outputs with g and we get the same answer (for every operation).

For a function with invariant input (e.g. invariant models) means g is the identity (no change).

Layer

in

out

g

Layer

in

out

g

=

17 of 125

17

Why limit yourself to equivariant functions?

You can substantially shrink the space of functions you need to optimize over.

This means you need less data to constrain your function.

All learnable functions

All learnable equivariant functions

All learnable functions constrained by your data.

Functions you actually wanted to learn.

18 of 125

18

Why not limit yourself to invariant functions?

You have to guarantee that your input features already

contain any necessary equivariant interactions (e.g. cross-products).

All learnable equivariant functions

Functions you actually wanted to learn.

All learnable invariant functions.

All invariant functions constrained by your data.

OR

19 of 125

How Euclidean Neural Networks achieve equivariance to Euclidean symmetry

(high level)

Euclidean Neural Networks encompass

Tensor Field Networks (arXiv:1802.08219)

Clebsch-Gordon Nets (arXiv:1806.09231)

3D Steerable CNNs (arXiv:1807.02547)

Cormorant (arXiv:1906.04015)

SE(3)-Transformers (arXiv:2006.10503)

e3nn (github.com/e3nn/e3nn)

(Technically, e3nn is the only one that implements inversion)

Some relevant folks… Mario Geiger, Ben Miller, Risi Kondor, Taco Cohen, Maurice Weiler, Daniel E. Worrall, Fabian B. Fuchs, Max Welling, Nathaniel Thomas, Shubhendu Trivedi,...

20 of 125

Euclidean Neural Networks are similar to convolutional neural networks...

21 of 125

Equivariant convolutional filters are based on learned radial functions and spherical harmonics...

=

Neighbor atoms

Convolution center

Euclidean Neural Networks are similar to convolutional neural networks...

22 of 125

Equivariant convolutional filters are based on learned radial functions and spherical harmonics...

=

Neighbor atoms

Convolution center

Euclidean Neural Networks are similar to convolutional neural networks...

Spherical harmonics of the same L transform together under rotation g.

Spherical harmonics transform in the same manner as the irreducible representations of SO(3).

23 of 125

Equivariant convolutional filters are based on learned radial functions and spherical harmonics...

=

Neighbor atoms

Convolution center

Euclidean Neural Networks are similar to convolutional neural networks...

24 of 125

Equivariant convolutional filters are based on learned radial functions and spherical harmonics...

=

Neighbor atoms

Convolution center

Euclidean Neural Networks are similar to convolutional neural networks...

...and geometric tensor algebra allow us to generalize scalar operations to more complex geometric tensors.

e.g. How to multiply two vectors?

scalar

vector

3x3 matrix

25 of 125

Equivariant convolutional filters are based on learned radial functions and spherical harmonics...

=

Neighbor atoms

Convolution center

Euclidean Neural Networks are similar to convolutional neural networks...

...and geometric tensor algebra allow us to generalize scalar operations to more complex geometric tensors.

e.g. How to multiply two vectors?

scalar

vector

3x3 matrix

26 of 125

26

The input to our network is geometry and features on that geometry.

geometry = [[x0, y0, z0],[x1, y1, z1]]

features = [

[m0, v0y, v0z, v0x, a0y, a0z, a0x]

[m1, v1y, v1z, v1x, a1y, a1z, a1x]

]

...

27 of 125

27

geometry = [[x0, y0, z0],[x1, y1, z1]]

features = [

[m0, v0y, v0z, v0x, a0y, a0z, a0x]

[m1, v1y, v1z, v1x, a1y, a1z, a1x]

]

Rs = [(1, 0, 1), (1, 1, -1), (1, 1, -1)]

# OR

Rs = [(1, 0, 1), (2, 1, -1)]

1 Scalar (L=0) (even parity)

2 Vectors (L=1)

(odd parity)

“Representation List”

Notation

Rs = [(copies, L, parity),...]

The input to our network is geometry and features on that geometry.

We categorize our features by how they transform under rotation and parity

as irreducible representations of O(3).

28 of 125

What does equivariance get you?

29 of 125

29

Given a molecule and a rotated copy,

predicted forces are the same up to rotation.

(Predicted forces are equivariant to rotation.)

Additionally, networks generalize to molecules with similar motifs.

30 of 125

30

Primitive unit cells, conventional unit cells, and supercells of the same crystal produce the same output (assuming periodic boundary conditions).

31 of 125

31

O

1s 2s 2s 2p 2p 3d

H

1s 2s 2p

H

1s 2s 2p

E(3)NNs can express tensors of atomic orbitals and predict molecular Hamiltonians in any orientation from seeing a single example.

32 of 125

geometry

features

We can convert local geometry into features and vice versa

via spherical harmonic projections.

E(3)NNs can manipulate geometry,

which means they can be used for generative models such as autoencoders.

33 of 125

E(3)NNs are extremely data efficient.

34 of 125

E(3)NNs for molecular dynamics (coming soon)

Water

Data from: Zhang, L. et al. E. (2018). PRL120(14), 143001.

Simon

Batzner

Boris

Kozinsky

E(3)NNs are extremely data efficient.

(meV/Å)

(meV/Å)

35 of 125

35

E(3)NNs for molecular dynamics (coming soon)

Water

E(3)NNs for phonon density of states (arxiv:2009.05163)

Training set of 1,200 crystal structures with 64 atom types.

Test set includes atom types never seen. Used to find high Cv.

Data from Materials Project

Simon

Batzner

Boris

Kozinsky

Zhantao Chen

Nina Andrejevic

Mingda Li

E(3)NNs are extremely data efficient.

Data from: Zhang, L. et al. E. (2018). PRL120(14), 143001.

(meV/Å)

(meV/Å)

36 of 125

Features that are consequences of fully treating Euclidean symmetry...

37 of 125

Vector

Pseudo-

vector

Double-

Headed

Ray

Spiral

Rs_vector = [(1, 1, -1)]

Rs_pseudovector = [(1, 1, 1)]

Rs_doubleray = [(1, 2, 1)]

Rs_spiral = [(1, 2, -1)]

Feature 1: All data (input, intermediates, output) in E(3)NNs are geometric tensors.

Geometric tensors are the “data types” of 3D space and have many forms.

38 of 125

Feature 1: All data (input, intermediates, output) in E(3)NNs are geometric tensors.

Geometric tensors are the “data types” of 3D space and have many forms.

Rs_s_orbital = [(1, 0, 1)]

Rs_p_orbital = [(1, 1, -1)]

Rs_d_orbital = [(1, 2, 1)]

Rs_f_orbital = [(1, 3, -1)]

39 of 125

39

Feature 2: The outputs have equal or higher symmetry than the inputs.

Curie’s principle (1894):

input

random model 1

random model 2

random model 3

Tetrahedron

Octahedron

“When effects show certain asymmetry, this asymmetry must be found

in the causes that gave rise to them.”

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

40 of 125

40

“When effects show certain asymmetry, this asymmetry must be found

in the causes that gave rise to them.”

Feature 2: The outputs have equal or higher symmetry than the inputs.

Curie’s principle (1894):

input

random model 1

random model 2

random model 3

Tetrahedron

Octahedron

Implement group equivariance and get all subgroups for FREE!

e.g. space groups, point groups

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

41 of 125

41

Feature 2: The outputs have equal or higher symmetry than the inputs.

Symmetry compiler -- can’t fit a model that does symmetrically make sense

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

42 of 125

42

Feature 2: The outputs have equal or higher symmetry than the inputs.

Symmetry compiler -- can’t fit a model that does symmetrically make sense

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

43 of 125

43

Feature 2: The outputs have equal or higher symmetry than the inputs.

Symmetry compiler -- can’t fit a model that does symmetrically make sense

Network predicts degenerate outcomes!

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

44 of 125

44

Feature 2: The outputs have equal or higher symmetry than the inputs.

Symmetry compiler -- can’t fit a model that does symmetrically make sense

Network predicts degenerate outcomes!

The network does NOT know the symmetry of the inputs or outputs! It only acts equivariantly.

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

45 of 125

45

Learns anisotropic inputs. Model can fit.

Input

Output

L = 0 + 2 + 4

L = 0

Use gradients to “find” what’s missing.

Feature 3: We can find data that is implied by symmetry.

Using gradients of loss wrt input we can find symmetry breaking “order parameters”

Irreps with

even parity

L ≥ 2 break degeneracy between x and y directions.

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

46 of 125

46

Feature 3: We can find data that is implied by symmetry.

Using gradients of loss wrt input we can find symmetry breaking “order parameters”

Octahedral tilting in perovskites (M3+ ⊕ R4+) ⇨

Network learns equal magnitude pseudovector order parameters on B site with proper spatial patterning.

T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)

47 of 125

47

developers of e3nn

Mario Geiger

(EPFL)

Ben Miller

(U of Amsterdam, formerly FU Berlin)

Tess Smidt

(LBNL)

Kostiantyn Lapchevskyi

e3nn: a modular PyTorch framework for Euclidean neural networks

https://github.com/e3nn/e3nn

Utilities and classes for

  • building E(3) equivariant neural networks
  • manipulating geometric tensors
  • visualizing spherical harmonics

48 of 125

48

e3nn: a modular PyTorch framework for Euclidean neural networks

https://github.com/e3nn/e3nn

Creating a basic convolution E(3) neural network

import torch

from e3nn import rs

from e3nn.networks import GatedConvParityNetwork

torch.set_default_dtype(torch.float64)

N_atom_types = 3 # For example H, C, O

Rs_in = [(N_atom_types, 0, 1)] # Input are scalars

Rs_out = [(1, 1, -1)] # Predict vectors

model_kwargs = {

'Rs_in': Rs_in, 'Rs_out': Rs_out, 'mul': 4, 'lmax': 2,

'layers': 3, 'max_radius': r_max, 'number_of_basis': 10,

}

model = GatedConvParityNetwork(**model_kwargs)

49 of 125

49

e3nn: a modular PyTorch framework for Euclidean neural networks

https://github.com/e3nn/e3nn

Convert between Cartesian tensors (with symmetric indices) and Irrep tensors and calculate degrees of freedom (e.g. elasticity tensor)

import torch

from e3nn import rs

from e3nn.tensor import CartesianTensor

torch.set_default_dtype(torch.float64)

rank4 = torch.zeros(3, 3, 3, 3) # Placeholder

Rs, Q = CartesianTensor(rank4, 'ijkl=jikl=klij').to_irrep_transformation()

print(“Representations: ”, Rs)

print(“Degrees of freedom: ”, rs.dim(Rs))

>> Representations: [(2, 0, 1), (2, 2, 1), (1, 4, 1)]

>> Degrees of freedom: 21

50 of 125

50

e3nn: a modular PyTorch framework for Euclidean neural networks

https://github.com/e3nn/e3nn

Plot 3x3 matrix as linear combination of spherical harmonics.

...

# Symmetric Matrix

M = torch.randn(3,3)

M = M + M.transpose(0, 1)

# Plot matrix

px.imshow(M)

matrix = CartesianTensor(M, formula='ij=ji').to_irrep_tensor()

r, f = SphericalTensor.from_irrep_tensor(matrix).plot()

# Plot SH signal

surface_plot = lambda r, f: go.Surface(

x=r[..., 0], y=r[..., 1], z=r[..., 2],

surfacecolor=f, showscale=False)

go.Figure([surface_plot(r, f)])

51 of 125

collaborators of e3nn

Boris

Kozinsky

Simon

Batzner

Josh Rackers

Thomas

Hardin

Eugene Kwan

Frank

Noé

Mingda

Li

Nina Andrejevic

Zhantao Chen

Claire

West

52 of 125

52

Feel free to reach out if you have any questions!

Tess Smidt

tsmidt@lbl.gov

A Quick Recap!

3D Euclidean symmetry:

rotations, translation, inversion

Different coordinate systems

⇨ same physical system

Euclidean Neural Networks are equivariant to E(3)

Convolutional filters

learned radial functions

and spherical harmonics

Geometric tensor algebra

Equivariant nonlinearities (did not discuss)

Equivariance can have unintended features.

1) Symmetry specific data types

2) Output symmetry equal to inputs

  • Implement group equivariance and get all subgroups for FREE!
  • Symmetry compilers

3) Grad loss wrt input can break symmetry

53 of 125

53

Resources on Euclidean neural networks:

e3nn.org

e3nn Code (PyTorch):

http://github.com/e3nn/e3nn

“quick” tutorial: https://tinyurl.com/e3nn-quick-tutorial-202011

e3nn_tutorial:

http://blondegeek.github.io/e3nn_tutorial/

Papers:

Tensor Field Networks (arXiv:1802.08219)

Clebsch-Gordon Nets (arXiv:1806.09231)

3D Steerable CNNs (arXiv:1807.02547)

Cormorant (arXiv:1906.04015)

SE(3)-Transformers (arXiv:2006.10503)

tfnns on proteins (arXiv:2006.09275)

e3nn on QM9 (arXiv:2008.08461)

e3nn for symm breaking (arXiv:2008.08461)

E(3) and equivariance in ML (chemrxiv.12935198.v1)

e3nn for phonon DOS (arxiv:2009.05163)

My past talks (look for video / slide links):

https://blondegeek.github.io/talks

Feel free to reach out if you have any questions!

Tess Smidt

tsmidt@lbl.gov

A Quick Recap!

3D Euclidean symmetry:

rotations, translation, inversion

Different coordinate systems

⇨ same physical system

Euclidean Neural Networks are equivariant to E(3)

Convolutional filters

learned radial functions

and spherical harmonics

Geometric tensor algebra

Equivariant nonlinearities (did not discuss)

Equivariance can have unintended features.

1) Symmetry specific data types

2) Output symmetry equal to inputs

  • Implement group equivariance and get all subgroups for FREE!
  • Symmetry compilers

3) Grad loss wrt input can break symmetry

54 of 125

54

Calling in backup (slides)!

55 of 125

55

Applications so far...

  • Finding order parameters of 2nd order structural phase transitions
  • Molecular dynamics (Harvard)
  • Molecule and crystal property prediction (FU Berlin)
  • Inverting invariant representations of atomic geometries (Sandia)
  • Autoencoding Geometry
  • Predicting molecular Hamiltonians (TU Berlin)
  • Long range interactions (FU Berlin, TU Berlin)
  • Electron density prediction for large molecules (Sandia)
  • Predicting chemical shifts for NMR (Merck, MIT)
  • Conditional protein design (UW)
  • Inverse design of optical properties of nanoparticle assemblies (LBL and UW)
  • Phonon properties of crystal structures (MIT)
  • Anharmonic elastic properties of crystal structures (UTEP)
  • ...

56 of 125

56

Let g be an element of SO(3)

a-1

+a0

+a1

=

D is the Wigner-D matrix.

It has shape

and is a function of g.

Spherical harmonics of a given L transform together under rotation.

g

b-1

+b0

+b1

D

57 of 125

57

Predict ab initio forces for molecular dynamics

Preliminary results originally presented at

APS March Meeting 2019.

Paper in progress.

Testing on liquid water, Euclidean neural networks (Tensor-Field Molecular Dynamics) require less data to train than traditional networks to get state of the art results.

Data set from: [1]

Zhang, L. et al. E. (2018).

PRL120(14), 143001.

Boris

Kozinsky

Simon

Batzner

58 of 125

Euclidean neural networks can manipulate geometry,

which means they can be used for generative models such as autoencoders.

59 of 125

geometry

features

To encode/decode, we have to be able to convert geometry into features and vice versa.

We do this via spherical harmonic projections.

Euclidean neural networks can manipulate geometry,

which means they can be used for generative models such as autoencoders.

60 of 125

60

Equivariant neural networks can learn to invert invariant representations.

Which can be used to recover geometry.

Network can predict spherical harmonic projection...

Invariant features + coordinate frame

ENN

Peak finding

Josh Rackers

Thomas

Hardin

61 of 125

Pooling

Pooling

Unpooling

Unpooling

We can also build an autoencoder for geometry: e.g. Autoencoder on 3D Tetris

Centers deleted

Centers deleted

62 of 125

Pooling

Pooling

Unpooling

Unpooling

We can also build an autoencoder for geometry: e.g. Autoencoder on 3D Tetris

63 of 125

63

Other atoms

Convolution center

We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).

We use points. Images of atomic systems are sparse and imprecise.

vs.

We use continuous convolutions with atoms as convolution centers.

Euclidean Neural Networks are similar to convolutional neural networks...

64 of 125

64

We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).

We use points. Images of atomic systems are sparse and imprecise.

vs.

Other atoms

Convolution center

We use continuous convolutions with atoms as convolution centers.

Euclidean Neural Networks are similar to convolutional neural networks...

65 of 125

65

We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).

We use continuous convolutions with atoms as convolution centers.

We use points. Images of atomic systems are sparse and imprecise.

vs.

Euclidean Neural Networks are similar to convolutional neural networks...

Other atoms

Convolution center

66 of 125

66

Translation equivariance

Convolutional neural network

Rotation equivariance

Data augmentation

Radial functions (invariant)

Want a network that both preserves geometry and exploits symmetry.

67 of 125

Invariant featurizations can be very expressive if well-crafted

Many invariant featurizations use equivariant operations

e.g. a (simplified) SOAP kernel for ethane molecule C2H6

  1. Project neighbors of given atom onto spherical harmonics (equivariant quantity).

  • Interact signals from different atoms via tensor dot product (equivariant operation) to produce scalars (invariant quantity).

  • Give scalars to model.

(1)

(2)

(3)

(Favored for kernel methods)

68 of 125

68

For a function to be equivariant means that we can act on our inputs with g

OR act our outputs with g and we get the same answer (for every operation).

For a function to be invariant means g is the identity (no change).

Layer

in

out

g

Layer

in

out

g

=

69 of 125

69

Why limit yourself to equivariant functions?

You can substantially shrink the space of functions you need to optimize over.

This means you need less data to constrain your function.

All learnable functions

All learnable equivariant functions

All learnable functions constrained by your data.

Functions you actually wanted to learn.

70 of 125

70

Why not limit yourself to invariant functions?

You have to guarantee that your input features already

contain any necessary equivariant interactions (e.g. cross-products).

All learnable equivariant functions

Functions you actually wanted to learn.

All learnable invariant functions.

All invariant functions constrained by your data.

OR

71 of 125

71

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Text ⇨ Recurrent NN

Components are independent.

The same features can be found anywhere in an image. Locality.

Sequential data. Next input/output depends on input/output that has come before.

W

x

Graph ⇨ Graph (Conv.) NN

3D physical data

⇨ Euclidean NN

Data in 3D Euclidean space. Freedom to choose coordinate system.

Topological data. Nodes have features and network passes messages between nodes connected via edges.

72 of 125

72

Neural networks are specially designed for different data types.

Assumptions about the data type are built into how the network operates.

Symmetries emerge from these assumptions.

Arrays ⇨ Dense NN

2D images

⇨ Convolutional NN

Text ⇨ Recurrent NN

Components are independent.

The same features can be found anywhere in an image. Locality.

Sequential data. Next input/output depends on input/output that has come before.

W

x

Graph ⇨ Graph (Conv.) NN

3D physical data

⇨ Euclidean NN

Data in 3D Euclidean space. Equivariant to choice of coordinate system.

No symmetry!

2D-translation symmetry

(forward) time-translation symm.

permutation symmetry

3D Euclidean symmetry E(3): 3D rotations translations and inversion

Topological data. Nodes have features and network passes messages between nodes connected via edges.

73 of 125

If you can craft a good representation -- great!

But deep learning’s specialty is feature learning.

So, maybe use a different machine learning approach (e.g. kernel methods).

Neural networks can’t mess up invariant representations.

You can use ANY neural network with an invariant representation.

Invariant representations can be used for other machine learning algorithms

(e.g. kernel methods).

74 of 125

74

Analogous to... the laws of (non-relativistic) physics have Euclidean symmetry,

even if systems do not.

The network is our model of physics. The input to the network is our system.

q

B

q

q

q

q

75 of 125

75

A Euclidean symmetry preserving network produces outputs that preserve

the subset of symmetries induced by the input.

O(3)

Oh

Pm-3m

(221)

SO(2) + mirrors

(C∞v)

3D rotations and inversions

2D rotation and mirrors along cone axis

Discrete rotations and mirrors

Discrete rotations, mirrors, and translations

76 of 125

76

Properties of a system must be compatible with symmetry.

Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?

m

m

m

m

m

m

a.

b.

c.

77 of 125

77

m

m

m

m

m

m

a.

b.

c.

Properties of a system must be compatible with symmetry.

Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?

78 of 125

78

m

m

m

m

m

m

a.

b.

c.

m

2m

Properties of a system must be compatible with symmetry.

Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?

79 of 125

79

m

m

m

m

m

m

a.

b.

c.

m

2m

m

m

g

Properties of a system must be compatible with symmetry.

Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?

80 of 125

80

Equivariance can have unintuitive consequences.

Partition graph with permutation equivariant function into two sets using ordered labels.

Predict node labels

[0, 1] vs. [1, 0]

81 of 125

81

Equivariance can have unintuitive consequences.

Partition graph with permutation equivariant function into two sets using ordered labels.

You can’t due to degeneracy.

[0, 1]

[1, 0]

[0, 1]

[1, 0]

There’s nothing to distinguish one partition to be “first” vs. “second”.

Predict node labels

[0, 1] vs. [1, 0]

82 of 125

Convolutions: Local vs. Global Symmetry

Convolutions capture local symmetry. Interaction of features in later layers yields global symmetry.

e.g. Coordination environments in crystals

Atomic systems form geometric motifs that can appear at multiple locations and orientations.

(Local symmetry)

Space group:

Symmetry of unit cell

(Global symmetry)

83 of 125

83

Translation symmetry in 2D:

Features “mean” the same thing in any location.

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

84 of 125

84

Translation symmetry in 2D:

Features “mean” the same thing in any location.

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

Symmetry of 2D objects

Boundaries “break” global translation symmetry.

Periodic boundary conditions preserve

discrete translation symmetry.

85 of 125

85

Permutation symmetry, SN:

Symmetry of sets

The freedom to list things in any order

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

86 of 125

86

Permutation symmetry, SN:

Symmetry of sets

The freedom to list things in any order

Symmetry of elements of a graph

Graph automorphism, specific nodes are indistinguishable (same global connectivity)

Symmetry emerges when different ways of representing something “mean” the same thing.

Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.

87 of 125

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

88 of 125

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

89 of 125

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

equivariant to x if

90 of 125

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

If we want to be equivariant to x, this has to be the case…

weights must be “scalars”

equivariant to x if

91 of 125

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

If we want to be equivariant to x, this has to be the case…

weights must be “scalars”

equivariant to x if

92 of 125

A bit of group theory! Don’t worry just a bit!

Formally, what are invariant vs. equivariant functions

function (neural network)...

element of group

representation of g acting on vector space

vector in vector space

inputs

outputs

weights

...which is equivalent to writing.

If we want to be equivariant to x, this has to be the case…

weights must be “scalars”

equivariant to x if

(special case) invariant to x if

93 of 125

93

M. Zaheer et al, Deep Sets, NeurIPS 2017

94 of 125

94

Convolutional neural networks can “cheat” by being sensitive to “boundaries”.

(e.g. Predict geodesics on projected maps with and without periodic boundary conditions)

User: Stebe

https://en.wikipedia.org/wiki/Gall-Peters_projection

Nodes can be distinguished due to differing topology by latitude (e.g. poles)!

Boundaries break symmetry.

Pixels cannot be distinguished due to translation equivariance.

95 of 125

95

In the physical sciences...

What our our data types?

3D geometry and geometric tensors...

...which transform predictably under 3D rotation, translation, and inversion.

These data types assume Euclidean symmetry.

⇨ Thus, we need neural networks that preserve Euclidean symmetry.

96 of 125

96

Scalars

  • Energy
  • Mass
  • Isotropic *

Vectors

  • Force
  • Velocity
  • Acceleration
  • Polarization

Pseudovectors

  • Angular momentum
  • Magnetic fields

Matrices, Tensors, …

  • Moment of Inertia
  • Polarizability
  • Interaction of multipoles
  • Elasticity tensor (rank 4)

m

Atomic orbitals

Output of Angular Fourier Transforms

Vector fields on spheres

(e.g. B-modes of the Cosmic Microwave Background)

Geometric tensors take many forms. They are a general data type beyond materials.

97 of 125

97

Our unit test: Trained on 3D Tetris shapes in one orientation,

these network can perfectly identify these shapes in any orientation.

TRAIN

TEST

Chiral

98 of 125

98

Several groups converged on similar ideas around the same time.

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

(arXiv:1802.08219)

Tess Smidt*, Nathaniel Thomas*, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, Patrick Riley

Points, nonlinearity on norm of tensors

Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network

(arXiv:1806.09231)

Risi Kondor, Zhen Lin, Shubhendu Trivedi

Only use tensor product as nonlinearity, no radial function

3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data

(arXiv:1807.02547)

Mario Geiger*, Maurice Weiler*, Max Welling, Wouter Boomsma, Taco Cohen

Efficient framework for voxels, gated nonlinearity

*denotes equal contribution

99 of 125

99

Several groups converged on similar ideas around the same time.

Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

(arXiv:1802.08219)

Tess Smidt*, Nathaniel Thomas*, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, Patrick Riley

Points, nonlinearity on norm of tensors

Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network

(arXiv:1806.09231)

Risi Kondor, Zhen Lin, Shubhendu Trivedi

Only use tensor product as nonlinearity, no radial function

3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data

(arXiv:1807.02547)

Mario Geiger*, Maurice Weiler*, Max Welling, Wouter Boomsma, Taco Cohen

Efficient framework for voxels, gated nonlinearity

*denotes equal contribution

Tensor field networks + 3D steerable CNNs

= Euclidean neural networks (e3nn)

100 of 125

100

Let g be a 3d rotation matrix.

a-1

+a0

+a1

=

D is the Wigner-D matrix.

It has shape

and is a function of g.

Spherical harmonics of a given L transform together under rotation.

g

b-1

+b0

+b1

D

101 of 125

Convolve

Bloom

Make points to cluster

Symmetric Cluster

Cluster bloomed points

Combine

Convolve with point origins of cluster members

Geometry

New Geometry

How to encode (Pooling layer). Recursively convert geometry to features.

102 of 125

1st

2nd

Convolve

Bloom

Make new points

Cluster

Merge duplicate points

Combine

Convolve with origin point

of new points

Geometry

New Geometry

How to decode (Unpooling layer). Recursively convert features to geometry.

103 of 125

103

Discrete geometry

Discrete geometry

Reduce geometry to single point.

Create geometry from single point.

We want to convert geometric information (3D coordinates of atomic positions)

into features on a trivial geometry (a single point)

and back again.

Single point with continuous

latent representation

(N dimensional vector)

104 of 125

104

Reduce geometry to single point.

Create geometry from single point.

Atomic structures are hierarchical and can be constructed from recurring geometric motifs.

We want to convert geometric information (3D coordinates of atomic positions)

into features on a trivial geometry (a single point)

and back again.

Discrete geometry

Discrete geometry

Single point with continuous

latent representation

(N dimensional vector)

105 of 125

105

Reduce geometry to single point.

Create geometry from single point.

  • Encode geometry
  • Encode hierarchy

(Need to do this in a recursive manner)

We want to convert geometric information (3D coordinates of atomic positions)

into features on a trivial geometry (a single point)

and back again.

Discrete geometry

Discrete geometry

Single point with continuous

latent representation

(N dimensional vector)

Atomic structures are hierarchical and can be constructed from recurring geometric motifs.

  • Decode geometry
  • Decode hierarchy

106 of 125

To autoencode, we have to be able to convert geometry into features and vice versa.

We do this via spherical harmonic projections.

107 of 125

107

...where the electrons are...

Given an atomic structure,

Energy (eV)

Momentum

...and what the electrons are doing.

...use quantum theory and supercomputers to determine...

What a computational materials physicist does:

Structure

Properties

Si

108 of 125

Quantum Theory / Molecular dynamics

+ Supercomputers

Properties

Hypothesize

Inverse Design

Zooooom!

Map

Structure

We want to use deep learning to speed up calculations, hypothesize new structures, perform inverse design, and organize these relations.

109 of 125

Quantum Theory / Molecular dynamics

+ Supercomputers

Properties

Hypothesize

Inverse Design

Zooooom!

Map

Structure

We want to use deep learning to speed up calculations, hypothesize new structures, perform inverse design, and organize these relations.

The problems start here

110 of 125

110

Given a single example of a degenerate solution,

it knows what other solutions are possible by symmetry.

(Useful for ensuring you’re not biasing your sampling.)

111 of 125

111

To be rotation-equivariant means that we can rotate our inputs

OR rotate our outputs and we get the same answer (for every operation).

Layer

in

out

Rot

Layer

in

out

Rot

=

112 of 125

112

For L=1 ⇨ L=1, the filters will be a learned, radially-dependent linear combinations of the L = 0, 1, and 2 spherical harmonics.

L=2

Random filters for

L=1 ⇨ L=1…

(3 in L=1 channels by

3 out L=1 channels)

… as a function of increasing r.

Time showing filter for varying r, where

0 ≤ r ≤ rmax.

(+ / )

Radial distance is magnitude

as a function of angle

113 of 125

113

114 of 125

114

Predictions for Oh symmetry

Ground Truth

Prediction of network trained with symmetry breaking input and given symmetry breaking input along z.

Prediction of network trained with symmetry breaking input but given trivial input

(single scalar).

Superposition of 6 rotationally degenerate solutions.

115 of 125

115

A brief primer on deep learning

deep learning ⊂ machine learning ⊂ artificial intelligence

model | deep learning | data | cost function | way to update parameters | conv. nets

116 of 125

116

model (“neural network”):

Function with learnable parameters.

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

117 of 125

117

model (“neural network”):

Function with learnable parameters.

Linear transformation

Element-wise nonlinear function

Learned

Parameters

Ex: "Fully-connected" network

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

118 of 125

118

model (“neural network”):

Function with learnable parameters.

Neural networks with multiple layers can learn more complicated functions.

Learned

Parameters

model | deep learning | data | cost function | way to update parameters | conv. nets

Ex: "Fully-connected" network

A brief primer on deep learning

119 of 125

119

model (“neural network”):

Function with learnable parameters.

Neural networks with multiple layers can learn more complicated functions.

Learned

Parameters

model | deep learning | data | cost function | way to update parameters | conv. nets

Ex: "Fully-connected" network

A brief primer on deep learning

120 of 125

120

deep learning:

Add more layers.

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

121 of 125

121

data:

Want lots of it. Model has many parameters. Don't want to easily overfit.

https://en.wikipedia.org/wiki/Overfitting

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

122 of 125

122

cost function:

A metric to assess how well the model is performing.

The cost function is evaluated on the output of the model.

Also called the loss or error.

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

123 of 125

123

way to update parameters:

Construct a model that is differentiable

Easiest to do with differentiable programming frameworks: e.g. Torch, TensorFlow, JAX, ...

Take derivatives of the cost function (loss or error) wrt to learnable parameters.

This is called backpropogation (aka the chain rule).

error

model | deep learning | data | cost function | way to update parameters | conv. nets

A brief primer on deep learning

124 of 125

124

http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

model | deep learning | data | cost function | way to update parameters | conv. nets

convolutional neural networks:

Used for images. In each layer, scan over image with learned filters.

A brief primer on deep learning

125 of 125

125

model | deep learning | data | cost function | way to update parameters | conv. nets

http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/

convolutional neural networks:

Used for images. In each layer, scan over image with learned filters.

A brief primer on deep learning

Back