Tess Smidt
2018 Alvarez Fellow
in Computing Sciences
Neural networks with Euclidean Symmetry for the Physical Sciences
Physics ∩ ML
2020.11.18
SLIDES: https://tinyurl.com/e3nn-physics-meets-ml
Tess Smidt
2018 Alvarez Fellow
in Computing Sciences
All I wanted was 3D rotation equivariance and I got...
....geometric tensors, space groups, point groups, selection rules, normal modes, degeneracy, 2nd order phase transitions, and a much better understanding of physics.
Neural networks with Euclidean Symmetry for the Physical Sciences
Physics ∩ ML
2020.11.18
SLIDES: https://tinyurl.com/e3nn-physics-meets-ml
3
The laws of physics have rotational, translational,
and (unless you’re a particle physicist) parity symmetry.
We want machine learning models that also obey this symmetry.
e.g. a network is our model of “physics”. The input to the network is our system.
q
B
q
q
q
q
4
Symmetry emerges when different ways of representing something “mean” the same thing.
Symmetry of representation vs. objects
5
Symmetry emerges when different ways of representing something “mean” the same thing.
Symmetry of representation vs. objects
Euclidean symmetry, E(3):
Symmetry of 3D space
The freedom to choose your coordinate system
3D Translation
3D Rotation
3D Inversion
Mirrors
We transform between coordinate systems with...
Symmetry emerges when different ways of representing something “mean” the same thing.
Symmetry of representation vs. objects
Euclidean symmetry, E(3):
Symmetry of 3D space
The freedom to choose your coordinate system
7
Symmetry of geometric objects
Looks the same under specific rotations, translations, and inversion (includes mirrors).
Symmetry emerges when different ways of representing something “mean” the same thing.
Symmetry of representation vs. objects
Euclidean symmetry, E(3):
Symmetry of 3D space
The freedom to choose your coordinate system
8
Symmetry of geometric objects
Looks the same under specific rotations, translations, and inversion (includes mirrors).
Symmetry emerges when different ways of representing something “mean” the same thing.
Symmetry of representation vs. objects
Euclidean symmetry, E(3):
Symmetry of 3D space
The freedom to choose your coordinate system
9
Neural networks are specially designed for different data types.
Assumptions about the data type are built into how the network operates.
Arrays ⇨ Dense NN
2D images
⇨ Convolutional NN
Text ⇨ Recurrent NN
Components are independent.
The same features can be found anywhere in an image. Locality.
Sequential data. Next input/output depends on input/output that has come before.
W
x
Graph ⇨ Graph (Conv.) NN
3D physical data
⇨ Euclidean NN
Data in 3D Euclidean space. Freedom to choose coordinate system.
Topological data. Nodes have features and network passes messages between nodes connected via edges.
10
Neural networks are specially designed for different data types.
Assumptions about the data type are built into how the network operates.
Thus, symmetries are encoded by tailoring network operations.
Arrays ⇨ Dense NN
2D images
⇨ Convolutional NN
Text ⇨ Recurrent NN
Components are independent.
The same features can be found anywhere in an image. Locality.
Sequential data. Next input/output depends on input/output that has come before.
W
x
Graph ⇨ Graph (Conv.) NN
3D physical data
⇨ Euclidean NN
Data in 3D Euclidean space. Equivariant to choice of coordinate system.
No symmetry!
2D-translation symmetry
(forward) time-translation symm.
permutation symmetry
3D Euclidean symmetry E(3): 3D rotations, translations, and inversion
Topological data. Nodes have features and network passes messages between nodes connected via edges.
11
H -0.21463 0.97837 0.33136
C -0.38325 0.66317 -0.70334
C -1.57552 0.03829 -1.05450
H -2.34514 -0.13834 -0.29630
C -1.78983 -0.36233 -2.36935
H -2.72799 -0.85413 -2.64566
C -0.81200 -0.13809 -3.33310
H -0.98066 -0.45335 -4.36774
C 0.38026 0.48673 -2.98192
H 1.14976 0.66307 -3.74025
C 0.59460 0.88737 -1.66708
H 1.53276 1.37906 -1.39070
Coordinates are most general, but sensitive to translations, rotations, and inversion.
Three ways to make models “symmetry-aware” for 3D data
e.g. How to make a model that “understands” the symmetry of atomic structures?
12
Approach 1:
Data Augmentation
Throw data at the problem and see what you get!
Approach 3:
Invariant models
Equivariant models
If there’s no model that naturally handles coordinates,
we will make one.
Approach 2:
Invariant Inputs
Convert your data to invariant representations so the neural network can’t possibly mess it up.
H -0.21463 0.97837 0.33136
C -0.38325 0.66317 -0.70334
C -1.57552 0.03829 -1.05450
H -2.34514 -0.13834 -0.29630
C -1.78983 -0.36233 -2.36935
H -2.72799 -0.85413 -2.64566
C -0.81200 -0.13809 -3.33310
H -0.98066 -0.45335 -4.36774
C 0.38026 0.48673 -2.98192
H 1.14976 0.66307 -3.74025
C 0.59460 0.88737 -1.66708
H 1.53276 1.37906 -1.39070
Coordinates are most general, but sensitive to translations, rotations, and inversion.
Three ways to make models “symmetry-aware” for 3D data
e.g. How to make a model that “understands” the symmetry of atomic structures?
13
Approach 1:
Data Augmentation
Throw data at the problem and see what you get!
Approach 3:
Invariant models
Equivariant models
If there’s no model that naturally handles coordinates,
we will make one.
Approach 2:
Invariant Inputs
Convert your data to invariant representations so the neural network can’t possibly mess it up.
👌
😭
😍
H -0.21463 0.97837 0.33136
C -0.38325 0.66317 -0.70334
C -1.57552 0.03829 -1.05450
H -2.34514 -0.13834 -0.29630
C -1.78983 -0.36233 -2.36935
H -2.72799 -0.85413 -2.64566
C -0.81200 -0.13809 -3.33310
H -0.98066 -0.45335 -4.36774
C 0.38026 0.48673 -2.98192
H 1.14976 0.66307 -3.74025
C 0.59460 0.88737 -1.66708
H 1.53276 1.37906 -1.39070
Coordinates are most general, but sensitive to translations, rotations, and inversion.
Three ways to make models “symmetry-aware” for 3D data
e.g. How to make a model that “understands” the symmetry of atomic structures?
14
Invariance vs. Equivariance (covariance) e.g. in 3D space
Does NOT change ⇨ Invariant
Changes deterministically ⇨ Equivariant
Properties of a vector under E(3)
Translation
Rotation
Inversion
3D vector
15
For 3D data, data augmentation is expensive, ~500 fold augmentation
and you still don’t get the guarantee of equivariance (it’s only emulated).
training without rotational symmetry
training with symmetry
16
For a function to be equivariant means that we can act on our inputs with g
OR act our outputs with g and we get the same answer (for every operation).
For a function with invariant input (e.g. invariant models) means g is the identity (no change).
Layer
in
out
g
Layer
in
out
g
=
17
Why limit yourself to equivariant functions?
You can substantially shrink the space of functions you need to optimize over.
This means you need less data to constrain your function.
All learnable functions
All learnable equivariant functions
All learnable functions constrained by your data.
Functions you actually wanted to learn.
18
Why not limit yourself to invariant functions?
You have to guarantee that your input features already
contain any necessary equivariant interactions (e.g. cross-products).
All learnable equivariant functions
Functions you actually wanted to learn.
All learnable invariant functions.
All invariant functions constrained by your data.
OR
How Euclidean Neural Networks achieve equivariance to Euclidean symmetry
(high level)
Euclidean Neural Networks encompass
Tensor Field Networks (arXiv:1802.08219)
Clebsch-Gordon Nets (arXiv:1806.09231)
3D Steerable CNNs (arXiv:1807.02547)
Cormorant (arXiv:1906.04015)
SE(3)-Transformers (arXiv:2006.10503)
e3nn (github.com/e3nn/e3nn)
(Technically, e3nn is the only one that implements inversion)
Some relevant folks… Mario Geiger, Ben Miller, Risi Kondor, Taco Cohen, Maurice Weiler, Daniel E. Worrall, Fabian B. Fuchs, Max Welling, Nathaniel Thomas, Shubhendu Trivedi,...
Euclidean Neural Networks are similar to convolutional neural networks...
Equivariant convolutional filters are based on learned radial functions and spherical harmonics...
=
Neighbor atoms
Convolution center
Euclidean Neural Networks are similar to convolutional neural networks...
Equivariant convolutional filters are based on learned radial functions and spherical harmonics...
=
Neighbor atoms
Convolution center
Euclidean Neural Networks are similar to convolutional neural networks...
Spherical harmonics of the same L transform together under rotation g.
Spherical harmonics transform in the same manner as the irreducible representations of SO(3).
Equivariant convolutional filters are based on learned radial functions and spherical harmonics...
=
Neighbor atoms
Convolution center
Euclidean Neural Networks are similar to convolutional neural networks...
Equivariant convolutional filters are based on learned radial functions and spherical harmonics...
=
Neighbor atoms
Convolution center
Euclidean Neural Networks are similar to convolutional neural networks...
...and geometric tensor algebra allow us to generalize scalar operations to more complex geometric tensors.
e.g. How to multiply two vectors?
scalar
vector
3x3 matrix
Equivariant convolutional filters are based on learned radial functions and spherical harmonics...
=
Neighbor atoms
Convolution center
Euclidean Neural Networks are similar to convolutional neural networks...
...and geometric tensor algebra allow us to generalize scalar operations to more complex geometric tensors.
e.g. How to multiply two vectors?
scalar
vector
3x3 matrix
26
The input to our network is geometry and features on that geometry.
geometry = [[x0, y0, z0],[x1, y1, z1]]
features = [
[m0, v0y, v0z, v0x, a0y, a0z, a0x]
[m1, v1y, v1z, v1x, a1y, a1z, a1x]
]
...
27
geometry = [[x0, y0, z0],[x1, y1, z1]]
features = [
[m0, v0y, v0z, v0x, a0y, a0z, a0x]
[m1, v1y, v1z, v1x, a1y, a1z, a1x]
]
Rs = [(1, 0, 1), (1, 1, -1), (1, 1, -1)]
# OR
Rs = [(1, 0, 1), (2, 1, -1)]
1 Scalar (L=0) (even parity)
2 Vectors (L=1)
(odd parity)
“Representation List”
Notation
Rs = [(copies, L, parity),...]
The input to our network is geometry and features on that geometry.
We categorize our features by how they transform under rotation and parity
as irreducible representations of O(3).
What does equivariance get you?
29
Given a molecule and a rotated copy,
predicted forces are the same up to rotation.
(Predicted forces are equivariant to rotation.)
Additionally, networks generalize to molecules with similar motifs.
30
Primitive unit cells, conventional unit cells, and supercells of the same crystal produce the same output (assuming periodic boundary conditions).
31
O
1s 2s 2s 2p 2p 3d
H
1s 2s 2p
H
1s 2s 2p
E(3)NNs can express tensors of atomic orbitals and predict molecular Hamiltonians in any orientation from seeing a single example.
geometry
features
We can convert local geometry into features and vice versa
via spherical harmonic projections.
E(3)NNs can manipulate geometry,
which means they can be used for generative models such as autoencoders.
E(3)NNs are extremely data efficient.
E(3)NNs for molecular dynamics (coming soon)
Water
Data from: Zhang, L. et al. E. (2018). PRL, 120(14), 143001.
Simon
Batzner
Boris
Kozinsky
E(3)NNs are extremely data efficient.
(meV/Å)
(meV/Å)
35
E(3)NNs for molecular dynamics (coming soon)
Water
E(3)NNs for phonon density of states (arxiv:2009.05163)
Training set of 1,200 crystal structures with 64 atom types.
Test set includes atom types never seen. Used to find high Cv.
Data from Materials Project
Simon
Batzner
Boris
Kozinsky
Zhantao Chen
Nina Andrejevic
Mingda Li
E(3)NNs are extremely data efficient.
Data from: Zhang, L. et al. E. (2018). PRL, 120(14), 143001.
(meV/Å)
(meV/Å)
Features that are consequences of fully treating Euclidean symmetry...
Vector
Pseudo-
vector
Double-
Headed
Ray
Spiral
Rs_vector = [(1, 1, -1)]
Rs_pseudovector = [(1, 1, 1)]
Rs_doubleray = [(1, 2, 1)]
Rs_spiral = [(1, 2, -1)]
Feature 1: All data (input, intermediates, output) in E(3)NNs are geometric tensors.
Geometric tensors are the “data types” of 3D space and have many forms.
Feature 1: All data (input, intermediates, output) in E(3)NNs are geometric tensors.
Geometric tensors are the “data types” of 3D space and have many forms.
Rs_s_orbital = [(1, 0, 1)]
Rs_p_orbital = [(1, 1, -1)]
Rs_d_orbital = [(1, 2, 1)]
Rs_f_orbital = [(1, 3, -1)]
39
Feature 2: The outputs have equal or higher symmetry than the inputs.
Curie’s principle (1894):
input
random model 1
random model 2
random model 3
Tetrahedron
Octahedron
“When effects show certain asymmetry, this asymmetry must be found
in the causes that gave rise to them.”
T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)
40
“When effects show certain asymmetry, this asymmetry must be found
in the causes that gave rise to them.”
Feature 2: The outputs have equal or higher symmetry than the inputs.
Curie’s principle (1894):
input
random model 1
random model 2
random model 3
Tetrahedron
Octahedron
Implement group equivariance and get all subgroups for FREE!
e.g. space groups, point groups
T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)
41
Feature 2: The outputs have equal or higher symmetry than the inputs.
Symmetry compiler -- can’t fit a model that does symmetrically make sense
T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)
42
✓
✗
Feature 2: The outputs have equal or higher symmetry than the inputs.
Symmetry compiler -- can’t fit a model that does symmetrically make sense
T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)
43
Feature 2: The outputs have equal or higher symmetry than the inputs.
Symmetry compiler -- can’t fit a model that does symmetrically make sense
Network predicts degenerate outcomes!
✓
✗
T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)
44
Feature 2: The outputs have equal or higher symmetry than the inputs.
Symmetry compiler -- can’t fit a model that does symmetrically make sense
Network predicts degenerate outcomes!
The network does NOT know the symmetry of the inputs or outputs! It only acts equivariantly.
✓
✗
T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)
45
✓
✗
→ Learns anisotropic inputs. → Model can fit.
Input
Output
L = 0 + 2 + 4
L = 0
Use gradients to “find” what’s missing.
Feature 3: We can find data that is implied by symmetry.
Using gradients of loss wrt input we can find symmetry breaking “order parameters”
Irreps with
even parity
L ≥ 2 break degeneracy between x and y directions.
T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)
46
Feature 3: We can find data that is implied by symmetry.
Using gradients of loss wrt input we can find symmetry breaking “order parameters”
Octahedral tilting in perovskites (M3+ ⊕ R4+) ⇨
Network learns equal magnitude pseudovector order parameters on B site with proper spatial patterning.
T. E. Smidt, M. Geiger, B. K. Miller. https://arxiv.org/abs/2007.02005 (2020)
47
developers of e3nn
Mario Geiger
(EPFL)
Ben Miller
(U of Amsterdam, formerly FU Berlin)
Tess Smidt
(LBNL)
Kostiantyn Lapchevskyi
e3nn: a modular PyTorch framework for Euclidean neural networks
Utilities and classes for
48
e3nn: a modular PyTorch framework for Euclidean neural networks
Creating a basic convolution E(3) neural network
import torch
from e3nn import rs
from e3nn.networks import GatedConvParityNetwork
torch.set_default_dtype(torch.float64)
N_atom_types = 3 # For example H, C, O
Rs_in = [(N_atom_types, 0, 1)] # Input are scalars
Rs_out = [(1, 1, -1)] # Predict vectors
model_kwargs = {
'Rs_in': Rs_in, 'Rs_out': Rs_out, 'mul': 4, 'lmax': 2,
'layers': 3, 'max_radius': r_max, 'number_of_basis': 10,
}
model = GatedConvParityNetwork(**model_kwargs)
49
e3nn: a modular PyTorch framework for Euclidean neural networks
Convert between Cartesian tensors (with symmetric indices) and Irrep tensors and calculate degrees of freedom (e.g. elasticity tensor)
import torch
from e3nn import rs
from e3nn.tensor import CartesianTensor
torch.set_default_dtype(torch.float64)
rank4 = torch.zeros(3, 3, 3, 3) # Placeholder
Rs, Q = CartesianTensor(rank4, 'ijkl=jikl=klij').to_irrep_transformation()
print(“Representations: ”, Rs)
print(“Degrees of freedom: ”, rs.dim(Rs))
>> Representations: [(2, 0, 1), (2, 2, 1), (1, 4, 1)]
>> Degrees of freedom: 21
50
e3nn: a modular PyTorch framework for Euclidean neural networks
Plot 3x3 matrix as linear combination of spherical harmonics.
...
# Symmetric Matrix
M = torch.randn(3,3)
M = M + M.transpose(0, 1)
# Plot matrix
px.imshow(M)
matrix = CartesianTensor(M, formula='ij=ji').to_irrep_tensor()
r, f = SphericalTensor.from_irrep_tensor(matrix).plot()
# Plot SH signal
surface_plot = lambda r, f: go.Surface(
x=r[..., 0], y=r[..., 1], z=r[..., 2],
surfacecolor=f, showscale=False)
go.Figure([surface_plot(r, f)])
collaborators of e3nn
Boris
Kozinsky
Simon
Batzner
Josh Rackers
Thomas
Hardin
Eugene Kwan
Frank
Noé
Mingda
Li
Nina Andrejevic
Zhantao Chen
Claire
West
52
Feel free to reach out if you have any questions!
Tess Smidt
tsmidt@lbl.gov
A Quick Recap!
3D Euclidean symmetry:
rotations, translation, inversion
Different coordinate systems
⇨ same physical system
Euclidean Neural Networks are equivariant to E(3)
Convolutional filters
⇨ learned radial functions
and spherical harmonics
Geometric tensor algebra
Equivariant nonlinearities (did not discuss)
Equivariance can have unintended features.
1) Symmetry specific data types
2) Output symmetry equal to inputs
3) Grad loss wrt input can break symmetry
53
Resources on Euclidean neural networks:
e3nn Code (PyTorch):
http://github.com/e3nn/e3nn
“quick” tutorial: https://tinyurl.com/e3nn-quick-tutorial-202011
e3nn_tutorial:
http://blondegeek.github.io/e3nn_tutorial/
Papers:
Tensor Field Networks (arXiv:1802.08219)
Clebsch-Gordon Nets (arXiv:1806.09231)
3D Steerable CNNs (arXiv:1807.02547)
Cormorant (arXiv:1906.04015)
SE(3)-Transformers (arXiv:2006.10503)
tfnns on proteins (arXiv:2006.09275)
e3nn on QM9 (arXiv:2008.08461)
e3nn for symm breaking (arXiv:2008.08461)
E(3) and equivariance in ML (chemrxiv.12935198.v1)
e3nn for phonon DOS (arxiv:2009.05163)
My past talks (look for video / slide links):
https://blondegeek.github.io/talks
Feel free to reach out if you have any questions!
Tess Smidt
tsmidt@lbl.gov
A Quick Recap!
3D Euclidean symmetry:
rotations, translation, inversion
Different coordinate systems
⇨ same physical system
Euclidean Neural Networks are equivariant to E(3)
Convolutional filters
⇨ learned radial functions
and spherical harmonics
Geometric tensor algebra
Equivariant nonlinearities (did not discuss)
Equivariance can have unintended features.
1) Symmetry specific data types
2) Output symmetry equal to inputs
3) Grad loss wrt input can break symmetry
54
Calling in backup (slides)!
55
Applications so far...
56
Let g be an element of SO(3)
a-1
+a0
+a1
=
D is the Wigner-D matrix.
It has shape
and is a function of g.
Spherical harmonics of a given L transform together under rotation.
g
b-1
+b0
+b1
D
57
Predict ab initio forces for molecular dynamics
Preliminary results originally presented at
APS March Meeting 2019.
Paper in progress.
Testing on liquid water, Euclidean neural networks (Tensor-Field Molecular Dynamics) require less data to train than traditional networks to get state of the art results.
Data set from: [1]
Zhang, L. et al. E. (2018).
PRL, 120(14), 143001.
Boris
Kozinsky
Simon
Batzner
Euclidean neural networks can manipulate geometry,
which means they can be used for generative models such as autoencoders.
geometry
features
To encode/decode, we have to be able to convert geometry into features and vice versa.
We do this via spherical harmonic projections.
Euclidean neural networks can manipulate geometry,
which means they can be used for generative models such as autoencoders.
60
Equivariant neural networks can learn to invert invariant representations.
Which can be used to recover geometry.
Network can predict spherical harmonic projection...
Invariant features + coordinate frame
ENN
Peak finding
Josh Rackers
Thomas
Hardin
Pooling
Pooling
Unpooling
Unpooling
We can also build an autoencoder for geometry: e.g. Autoencoder on 3D Tetris
Centers deleted
Centers deleted
Pooling
Pooling
Unpooling
Unpooling
We can also build an autoencoder for geometry: e.g. Autoencoder on 3D Tetris
63
Other atoms
Convolution center
We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).
We use points. Images of atomic systems are sparse and imprecise.
vs.
We use continuous convolutions with atoms as convolution centers.
Euclidean Neural Networks are similar to convolutional neural networks...
64
We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).
We use points. Images of atomic systems are sparse and imprecise.
vs.
Other atoms
Convolution center
We use continuous convolutions with atoms as convolution centers.
Euclidean Neural Networks are similar to convolutional neural networks...
65
We encode the symmetries of 3D Euclidean space (3D translation- and 3D rotation-equivariance).
We use continuous convolutions with atoms as convolution centers.
We use points. Images of atomic systems are sparse and imprecise.
vs.
Euclidean Neural Networks are similar to convolutional neural networks...
Other atoms
Convolution center
66
Translation equivariance
Convolutional neural network ✓
Rotation equivariance
Data augmentation
Radial functions (invariant)
Want a network that both preserves geometry and exploits symmetry.
Invariant featurizations can be very expressive if well-crafted
Many invariant featurizations use equivariant operations
e.g. a (simplified) SOAP kernel for ethane molecule C2H6
(1)
(2)
(3)
(Favored for kernel methods)
68
For a function to be equivariant means that we can act on our inputs with g
OR act our outputs with g and we get the same answer (for every operation).
For a function to be invariant means g is the identity (no change).
Layer
in
out
g
Layer
in
out
g
=
69
Why limit yourself to equivariant functions?
You can substantially shrink the space of functions you need to optimize over.
This means you need less data to constrain your function.
All learnable functions
All learnable equivariant functions
All learnable functions constrained by your data.
Functions you actually wanted to learn.
70
Why not limit yourself to invariant functions?
You have to guarantee that your input features already
contain any necessary equivariant interactions (e.g. cross-products).
All learnable equivariant functions
Functions you actually wanted to learn.
All learnable invariant functions.
All invariant functions constrained by your data.
OR
71
Neural networks are specially designed for different data types.
Assumptions about the data type are built into how the network operates.
Arrays ⇨ Dense NN
2D images
⇨ Convolutional NN
Text ⇨ Recurrent NN
Components are independent.
The same features can be found anywhere in an image. Locality.
Sequential data. Next input/output depends on input/output that has come before.
W
x
Graph ⇨ Graph (Conv.) NN
3D physical data
⇨ Euclidean NN
Data in 3D Euclidean space. Freedom to choose coordinate system.
Topological data. Nodes have features and network passes messages between nodes connected via edges.
72
Neural networks are specially designed for different data types.
Assumptions about the data type are built into how the network operates.
Symmetries emerge from these assumptions.
Arrays ⇨ Dense NN
2D images
⇨ Convolutional NN
Text ⇨ Recurrent NN
Components are independent.
The same features can be found anywhere in an image. Locality.
Sequential data. Next input/output depends on input/output that has come before.
W
x
Graph ⇨ Graph (Conv.) NN
3D physical data
⇨ Euclidean NN
Data in 3D Euclidean space. Equivariant to choice of coordinate system.
No symmetry!
2D-translation symmetry
(forward) time-translation symm.
permutation symmetry
3D Euclidean symmetry E(3): 3D rotations translations and inversion
Topological data. Nodes have features and network passes messages between nodes connected via edges.
✓
✓
✓
✓
If you can craft a good representation -- great!
But deep learning’s specialty is feature learning.
So, maybe use a different machine learning approach (e.g. kernel methods).
Neural networks can’t mess up invariant representations.
You can use ANY neural network with an invariant representation.
Invariant representations can be used for other machine learning algorithms
(e.g. kernel methods).
74
Analogous to... the laws of (non-relativistic) physics have Euclidean symmetry,
even if systems do not.
The network is our model of “physics”. The input to the network is our system.
q
B
q
q
q
q
75
A Euclidean symmetry preserving network produces outputs that preserve
the subset of symmetries induced by the input.
O(3)
Oh
Pm-3m
(221)
SO(2) + mirrors
(C∞v)
3D rotations and inversions
2D rotation and mirrors along cone axis
Discrete rotations and mirrors
Discrete rotations, mirrors, and translations
76
Properties of a system must be compatible with symmetry.
Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?
m
m
m
m
m
m
a.
b.
c.
77
m
m
m
m
m
m
a.
b.
c.
✓
✗
✗
Properties of a system must be compatible with symmetry.
Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?
78
m
m
m
m
m
m
a.
b.
c.
✓
✗
✗
m
2m
Properties of a system must be compatible with symmetry.
Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?
79
m
m
m
m
m
m
a.
b.
c.
✓
✗
✗
m
2m
m
m
g
Properties of a system must be compatible with symmetry.
Which of these situations (inputs / outputs) are symmetrically allowed / forbidden?
80
Equivariance can have unintuitive consequences.
Partition graph with permutation equivariant function into two sets using ordered labels.
Predict node labels
[0, 1] vs. [1, 0]
81
Equivariance can have unintuitive consequences.
Partition graph with permutation equivariant function into two sets using ordered labels.
You can’t due to degeneracy.
[0, 1]
[1, 0]
[0, 1]
[1, 0]
There’s nothing to distinguish one partition to be “first” vs. “second”.
Predict node labels
[0, 1] vs. [1, 0]
Convolutions: Local vs. Global Symmetry
Convolutions capture local symmetry. Interaction of features in later layers yields global symmetry.
e.g. Coordination environments in crystals
Atomic systems form geometric motifs that can appear at multiple locations and orientations.
(Local symmetry)
Space group:
Symmetry of unit cell
(Global symmetry)
83
Translation symmetry in 2D:
Features “mean” the same thing in any location.
Symmetry emerges when different ways of representing something “mean” the same thing.
Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.
✓
✗
✓
✓
84
Translation symmetry in 2D:
Features “mean” the same thing in any location.
Symmetry emerges when different ways of representing something “mean” the same thing.
Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.
Symmetry of 2D objects
Boundaries “break” global translation symmetry.
Periodic boundary conditions preserve
discrete translation symmetry.
✓
✗
✓
✓
85
Permutation symmetry, SN:
Symmetry of sets
The freedom to list things in any order
Symmetry emerges when different ways of representing something “mean” the same thing.
Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.
86
Permutation symmetry, SN:
Symmetry of sets
The freedom to list things in any order
Symmetry of elements of a graph
Graph automorphism, specific nodes are indistinguishable (same global connectivity)
Symmetry emerges when different ways of representing something “mean” the same thing.
Representation can have symmetry, operations can preserve symmetry, and objects can have symmetry.
A bit of group theory! Don’t worry just a bit!
Formally, what are invariant vs. equivariant functions
function (neural network)...
vector in vector space
inputs
outputs
weights
...which is equivalent to writing.
A bit of group theory! Don’t worry just a bit!
Formally, what are invariant vs. equivariant functions
function (neural network)...
element of group
representation of g acting on vector space
vector in vector space
inputs
outputs
weights
...which is equivalent to writing.
A bit of group theory! Don’t worry just a bit!
Formally, what are invariant vs. equivariant functions
function (neural network)...
element of group
representation of g acting on vector space
vector in vector space
inputs
outputs
weights
...which is equivalent to writing.
equivariant to x if
A bit of group theory! Don’t worry just a bit!
Formally, what are invariant vs. equivariant functions
function (neural network)...
element of group
representation of g acting on vector space
vector in vector space
inputs
outputs
weights
...which is equivalent to writing.
If we want to be equivariant to x, this has to be the case…
weights must be “scalars”
equivariant to x if
A bit of group theory! Don’t worry just a bit!
Formally, what are invariant vs. equivariant functions
function (neural network)...
element of group
representation of g acting on vector space
vector in vector space
inputs
outputs
weights
...which is equivalent to writing.
If we want to be equivariant to x, this has to be the case…
weights must be “scalars”
equivariant to x if
A bit of group theory! Don’t worry just a bit!
Formally, what are invariant vs. equivariant functions
function (neural network)...
element of group
representation of g acting on vector space
vector in vector space
inputs
outputs
weights
...which is equivalent to writing.
If we want to be equivariant to x, this has to be the case…
weights must be “scalars”
equivariant to x if
(special case) invariant to x if
93
M. Zaheer et al, Deep Sets, NeurIPS 2017
94
Convolutional neural networks can “cheat” by being sensitive to “boundaries”.
(e.g. Predict geodesics on projected maps with and without periodic boundary conditions)
User: Stebe
https://en.wikipedia.org/wiki/Gall-Peters_projection
✓
✗
✓
Nodes can be distinguished due to differing topology by latitude (e.g. poles)!
Boundaries break symmetry.
Pixels cannot be distinguished due to translation equivariance.
95
In the physical sciences...
What our our data types?
3D geometry and geometric tensors...
...which transform predictably under 3D rotation, translation, and inversion.
These data types assume Euclidean symmetry.
⇨ Thus, we need neural networks that preserve Euclidean symmetry.
96
Scalars
Vectors
Pseudovectors
Matrices, Tensors, …
m
Atomic orbitals
Output of Angular Fourier Transforms
Vector fields on spheres
(e.g. B-modes of the Cosmic Microwave Background)
Geometric tensors take many forms. They are a general data type beyond materials.
97
Our unit test: Trained on 3D Tetris shapes in one orientation,
these network can perfectly identify these shapes in any orientation.
TRAIN
TEST
Chiral
98
Several groups converged on similar ideas around the same time.
Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds
(arXiv:1802.08219)
Tess Smidt*, Nathaniel Thomas*, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, Patrick Riley
Points, nonlinearity on norm of tensors
Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network
(arXiv:1806.09231)
Risi Kondor, Zhen Lin, Shubhendu Trivedi
Only use tensor product as nonlinearity, no radial function
3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data
(arXiv:1807.02547)
Mario Geiger*, Maurice Weiler*, Max Welling, Wouter Boomsma, Taco Cohen
Efficient framework for voxels, gated nonlinearity
*denotes equal contribution
99
Several groups converged on similar ideas around the same time.
Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds
(arXiv:1802.08219)
Tess Smidt*, Nathaniel Thomas*, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, Patrick Riley
Points, nonlinearity on norm of tensors
Clebsch-Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network
(arXiv:1806.09231)
Risi Kondor, Zhen Lin, Shubhendu Trivedi
Only use tensor product as nonlinearity, no radial function
3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data
(arXiv:1807.02547)
Mario Geiger*, Maurice Weiler*, Max Welling, Wouter Boomsma, Taco Cohen
Efficient framework for voxels, gated nonlinearity
*denotes equal contribution
Tensor field networks + 3D steerable CNNs
= Euclidean neural networks (e3nn)
100
Let g be a 3d rotation matrix.
a-1
+a0
+a1
=
D is the Wigner-D matrix.
It has shape
and is a function of g.
Spherical harmonics of a given L transform together under rotation.
g
b-1
+b0
+b1
D
Convolve
Bloom
Make points to cluster
Symmetric Cluster
Cluster bloomed points
Combine
Convolve with point origins of cluster members
Geometry
New Geometry
How to encode (Pooling layer). Recursively convert geometry to features.
1st
2nd
Convolve
Bloom
Make new points
Cluster
Merge duplicate points
Combine
Convolve with origin point
of new points
Geometry
New Geometry
How to decode (Unpooling layer). Recursively convert features to geometry.
103
Discrete geometry
Discrete geometry
Reduce geometry to single point.
Create geometry from single point.
We want to convert geometric information (3D coordinates of atomic positions)
into features on a trivial geometry (a single point)
and back again.
Single point with continuous
latent representation
(N dimensional vector)
104
Reduce geometry to single point.
Create geometry from single point.
Atomic structures are hierarchical and can be constructed from recurring geometric motifs.
We want to convert geometric information (3D coordinates of atomic positions)
into features on a trivial geometry (a single point)
and back again.
Discrete geometry
Discrete geometry
Single point with continuous
latent representation
(N dimensional vector)
105
Reduce geometry to single point.
Create geometry from single point.
(Need to do this in a recursive manner)
We want to convert geometric information (3D coordinates of atomic positions)
into features on a trivial geometry (a single point)
and back again.
Discrete geometry
Discrete geometry
Single point with continuous
latent representation
(N dimensional vector)
Atomic structures are hierarchical and can be constructed from recurring geometric motifs.
To autoencode, we have to be able to convert geometry into features and vice versa.
We do this via spherical harmonic projections.
107
...where the electrons are...
Given an atomic structure,
Energy (eV)
Momentum
...and what the electrons are doing.
...use quantum theory and supercomputers to determine...
What a computational materials physicist does:
Structure
Properties
Si
Quantum Theory / Molecular dynamics
+ Supercomputers
Properties
Hypothesize
Inverse Design
Zooooom!
Map
Structure
We want to use deep learning to speed up calculations, hypothesize new structures, perform inverse design, and organize these relations.
Quantum Theory / Molecular dynamics
+ Supercomputers
Properties
Hypothesize
Inverse Design
Zooooom!
Map
Structure
We want to use deep learning to speed up calculations, hypothesize new structures, perform inverse design, and organize these relations.
The problems start here
110
Given a single example of a degenerate solution,
it knows what other solutions are possible by symmetry.
(Useful for ensuring you’re not biasing your sampling.)
111
To be rotation-equivariant means that we can rotate our inputs
OR rotate our outputs and we get the same answer (for every operation).
Layer
in
out
Rot
Layer
in
out
Rot
=
112
For L=1 ⇨ L=1, the filters will be a learned, radially-dependent linear combinations of the L = 0, 1, and 2 spherical harmonics.
L=2
Random filters for
L=1 ⇨ L=1…
(3 in L=1 channels by
3 out L=1 channels)
… as a function of increasing r.
Time showing filter for varying r, where
0 ≤ r ≤ rmax.
(+ / –)
Radial distance is magnitude
as a function of angle
113
114
Predictions for Oh symmetry
Ground Truth
Prediction of network trained with symmetry breaking input and given symmetry breaking input along z.
Prediction of network trained with symmetry breaking input but given trivial input
(single scalar).
Superposition of 6 rotationally degenerate solutions.
115
A brief primer on deep learning
deep learning ⊂ machine learning ⊂ artificial intelligence
model | deep learning | data | cost function | way to update parameters | conv. nets
116
model (“neural network”):
Function with learnable parameters.
model | deep learning | data | cost function | way to update parameters | conv. nets
A brief primer on deep learning
117
model (“neural network”):
Function with learnable parameters.
Linear transformation
Element-wise nonlinear function
Learned
Parameters
Ex: "Fully-connected" network
model | deep learning | data | cost function | way to update parameters | conv. nets
A brief primer on deep learning
118
model (“neural network”):
Function with learnable parameters.
Neural networks with multiple layers can learn more complicated functions.
Learned
Parameters
model | deep learning | data | cost function | way to update parameters | conv. nets
Ex: "Fully-connected" network
A brief primer on deep learning
119
model (“neural network”):
Function with learnable parameters.
Neural networks with multiple layers can learn more complicated functions.
Learned
Parameters
model | deep learning | data | cost function | way to update parameters | conv. nets
Ex: "Fully-connected" network
A brief primer on deep learning
120
deep learning:
Add more layers.
model | deep learning | data | cost function | way to update parameters | conv. nets
A brief primer on deep learning
121
data:
Want lots of it. Model has many parameters. Don't want to easily overfit.
https://en.wikipedia.org/wiki/Overfitting
model | deep learning | data | cost function | way to update parameters | conv. nets
A brief primer on deep learning
122
cost function:
A metric to assess how well the model is performing.
The cost function is evaluated on the output of the model.
Also called the loss or error.
model | deep learning | data | cost function | way to update parameters | conv. nets
A brief primer on deep learning
123
way to update parameters:
Construct a model that is differentiable
Easiest to do with differentiable programming frameworks: e.g. Torch, TensorFlow, JAX, ...
Take derivatives of the cost function (loss or error) wrt to learnable parameters.
This is called backpropogation (aka the chain rule).
error
model | deep learning | data | cost function | way to update parameters | conv. nets
A brief primer on deep learning
124
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
model | deep learning | data | cost function | way to update parameters | conv. nets
convolutional neural networks:
Used for images. In each layer, scan over image with learned filters.
A brief primer on deep learning
125
model | deep learning | data | cost function | way to update parameters | conv. nets
http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/
convolutional neural networks:
Used for images. In each layer, scan over image with learned filters.
A brief primer on deep learning
Back