1 of 18

Evaluating Equivariance for Reconstruction

Savannah Thais, Columbia University

Daniel Murnane, Lawrence Berkeley National Lab

2 of 18

Why do some ML models outperform others?

We often highly fine-tune models and attribute their performance to a certain design choice: size, input features, hyperparemeters, network design, etc

3 of 18

Physical Inductive Biases

Data Structure

Model Constraints

Task Formulation

Relational structure, ordering, feature selection, pre-processing, etc

Restricting model weights, learned function, propagated information, etc

Physics informed neural networks, incorporating conservation laws or equations through loss function design, etc

4 of 18

Symmetries are (potentially) powerful physical inductive biases

ɸ

Image from arxiv:2102.09844

5 of 18

Equivariant networks are popular, partially because they are (in some formulations) easy to implement

Diagram from Daniel Murnane

6 of 18

Potential Benefits of Equivariance

Model Efficiency

Most published ML results achieve SotA accuracy
Often the main theoretical advantage of equivariant models (inspired by CNNs and GNNs)

Should learn complete symmetry orbit from one example
Learns reduced representation for easier generalization

Models may have an ‘easier’ time learning optimal functions
Equivariance allows models to use parameters more efficiently

Do not need to rely on data augmentation to learn symmetries
Should reduce training data size (and computational resource usage)

Accuracy

Generalizability

Data Efficiency

7 of 18

Physics as an ML Sandbox

Particle Tracking

Using TrackML Dataset
Build an SO(2) rotation equivariant model (in x-y plane)

Using Top Quark Tagging Reference Dataset
Build a Lorentz equivariant model (rotations and boosts in spacetime)

Jet Tagging

Image from Nelson

Image from arxiv:2109.12636

8 of 18

Baseline Tagging Models

ParticleNet

Message passing dynamic graph GNN on particle graph

arXiv:1902.08570

Deep 2D CNN on jet images

arXiv:1611.05431

ResNexT

Deep set network on particle features

arXiv:1810.05165

Particle Flow

Linear discriminant on EFP complete linear basis

arXiv:1712.07124

Energy Flow Polynomials

9 of 18

Equivariant Tagging Models

LorentzNet

Message passing GNN with Lorentz equivariant message, on particle graph

arXiv:2201.08187

NN with CG-layers that take tensor products and decompose into irreps using Clebsch-Gordan map, on particle features

arXiv:2006.04780

Lorentz Group Network

Deep set-esq network using all totally symmetric Lorentz invariants and full set of 15 rank 2 to rank 2 maps as aggregators

arXiv:2211.00454

PELICAN

Message passing GNN with Lorentz equivariant message and (optionally) unconstrained message, on particle graph

arXiv:2202.06941

VecNet

10 of 18

Tracking Models

EuclidNet

Interaction Network

Message passing GNN with node and edge updates, on hit graph (with physics-based edge construction)

arXiv:2103.16701

Message passing GNN with SO(2)-equivariant message construction, on hit graph (with physics-based edge construction)

arXiv:2304.05293

11 of 18

Evaluating Equivariance

Jet tagging: highest accuracy model is equivariant, but not all equivariant models perform well
Tracking: for small models equivariant models have highest accuracy, but performance plateaus as models grow
Overall, relationship between equivariance and accuracy is unclear (confounding factors remain)

Accuracy

Jet tagging: regression model with physics inputs is most efficient. Semi-equivariant model is also efficient.
Tracking: relationship changes with model size
Overall, equivariance does not seem to contribute directly to model efficiency

Model Efficiency

	Accuracy	AUC	Parameters	Ant Factor
ResNeXt	0.936	0.984	1.46M	4.28
ParticleNet	0.938	0.985	498k	13.4
PFN	0.932	0.982	82k	67.8
EFP	0.932	0.980	1k	5000
LGN	0.929	0.964	4.5k	617
VecNet.1	0.935	0.984	633k	9.87
VecNet.2	0.931	0.981	15k	350
PELICAN	0.943	0.987	45k	171
LorentzNet	0.942	0.9868	220k	35

	N Hidden	AUC	Parameters	Ant Factor
EuclidNet	8	0.9913	967	11887
InteractionNet	8	0.9849	1432	4625
EuclidNet	16	0.9932	2580	5700
InteractionNet	16	0.9932	4392	3348
EuclidNet	32	0.9941	4448	3811
InteractionNet	32	0.9978	6448	7049

Ant factor = 10^5/[(1-AUC)*N_p]

Tracking

Tagging

12 of 18

Evaluating Equivariance

Jet tagging: equivariant models generalize, but not all to the same extent
Tracking: both equivariant and sufficiently large non-equivariant models generalize
Overall, equivariance provides a good amount of generalization, but other models can too (tradeoffs)

Generalizability

Jet tagging: clear benefit from equivariance in very small data regimes: achieves 99% of full accuracy with just 0.5% of training dataset

Compared to 97% for non-equivariant model

Overall, seems to be the most replicable benefit of equivariance. This is demonstrated in other papers, such as NequIP

Data Efficiency

	Training %	Accuracy	AUC
LorentzNet	0.5%	0.932	0.9793
ParticleNet	0.5%	0.913	0.9687
LorentzNet	1%	0.932	0.9812
ParticleNet	1%	0.919	0.9734
LorentzNet	5%	0.937	0.9839
ParticleNet	5%	0.931	0.9839

Tagging

Tracking

13 of 18

So, What Now?

What kinds of inductive biases are useful? How are they useful?

14 of 18

Over-constraint?

Is full equivariance the right approach for HEP tasks?

Unconstrained models can learn to generalize under symmetry transformations
VecNet studies show optimal accuracy and model efficiency are achieved with mixed equivariant and non-equivariant information
While the underlying physics is obeys symmetries, observed data is likely NOT fully symmetric

15 of 18

Expressivity?

Equivariance is a bit of an ill-defined model characteristic

Different methods of enforcing equivariance yield substantially different results
Not proven that it is easier to find the global optimum in a constrained optimization space
Highly performing HEP models have other design choices besides equivariance

Fundamental work on network expressivity indicates that feature choice and message construction largely determine expressivity

16 of 18

Transformers?

Transformers provide excellent performance with minimal or no inductive bias…

In a many parameter, big data regime does physics matter at all?

arXiv:2202.03772

17 of 18

Three Takeaways

Consider the Physics Goals

Explore Other Inductive Biases

There is no one-size-fits-all solution to building the optimal model for a physics task. Trade-offs between model size, compute resources, data availability, robustness, etc are key

Consider inductive biases that modify data structures or task design, rather than constraining optimization space

Conduct apples-to-apples model comparisons that are better able to isolate the impact of design choices and carefully designed ablation studies

More Systematic Studies

18 of 18

What’s Next?

Let’s discuss!

st3565@columbia.edu

@basicsciencesav

Our paper: Equivariance Is Not All You Need

CREDITS: This presentation template was created by Slidesgo, and includes icons by Flaticon, and infographics & images by Freepik