1 of 18

Evaluating Equivariance for Reconstruction

Savannah Thais, Columbia University

Daniel Murnane, Lawrence Berkeley National Lab

2 of 18

Why do some ML models outperform others?

We often highly fine-tune models and attribute their performance to a certain design choice: size, input features, hyperparemeters, network design, etc

3 of 18

Physical Inductive Biases

Data Structure

Model Constraints

Task Formulation

Relational structure, ordering, feature selection, pre-processing, etc

Restricting model weights, learned function, propagated information, etc

Physics informed neural networks, incorporating conservation laws or equations through loss function design, etc

4 of 18

Symmetries are (potentially) powerful physical inductive biases

ɸ

ɸ

5 of 18

Equivariant networks are popular, partially because they are (in some formulations) easy to implement

Diagram from Daniel Murnane

6 of 18

Potential Benefits of Equivariance

Model Efficiency

  • Most published ML results achieve SotA accuracy
  • Often the main theoretical advantage of equivariant models (inspired by CNNs and GNNs)
  • Should learn complete symmetry orbit from one example
  • Learns reduced representation for easier generalization
  • Models may have an ‘easier’ time learning optimal functions
  • Equivariance allows models to use parameters more efficiently
  • Do not need to rely on data augmentation to learn symmetries
  • Should reduce training data size (and computational resource usage)

Accuracy

Generalizability

Data Efficiency

7 of 18

Physics as an ML Sandbox

Particle Tracking

  • Using TrackML Dataset
  • Build an SO(2) rotation equivariant model (in x-y plane)
  • Using Top Quark Tagging Reference Dataset
  • Build a Lorentz equivariant model (rotations and boosts in spacetime)

Jet Tagging

Image from Nelson

8 of 18

Baseline Tagging Models

ParticleNet

Message passing dynamic graph GNN on particle graph

arXiv:1902.08570

Deep 2D CNN on jet images

arXiv:1611.05431

ResNexT

Deep set network on particle features

arXiv:1810.05165

Particle Flow

Linear discriminant on EFP complete linear basis

arXiv:1712.07124

Energy Flow Polynomials

9 of 18

Equivariant Tagging Models

LorentzNet

Message passing GNN with Lorentz equivariant message, on particle graph

arXiv:2201.08187

NN with CG-layers that take tensor products and decompose into irreps using Clebsch-Gordan map, on particle features

arXiv:2006.04780

Lorentz Group Network

Deep set-esq network using all totally symmetric Lorentz invariants and full set of 15 rank 2 to rank 2 maps as aggregators

arXiv:2211.00454

PELICAN

Message passing GNN with Lorentz equivariant message and (optionally) unconstrained message, on particle graph

arXiv:2202.06941

VecNet

10 of 18

Tracking Models

EuclidNet

Interaction Network

Message passing GNN with node and edge updates, on hit graph (with physics-based edge construction)

arXiv:2103.16701

Message passing GNN with SO(2)-equivariant message construction, on hit graph (with physics-based edge construction)

arXiv:2304.05293

11 of 18

Evaluating Equivariance

  • Jet tagging: highest accuracy model is equivariant, but not all equivariant models perform well
  • Tracking: for small models equivariant models have highest accuracy, but performance plateaus as models grow
  • Overall, relationship between equivariance and accuracy is unclear (confounding factors remain)

Accuracy

  • Jet tagging: regression model with physics inputs is most efficient. Semi-equivariant model is also efficient.
  • Tracking: relationship changes with model size
  • Overall, equivariance does not seem to contribute directly to model efficiency

Model Efficiency

Accuracy

AUC

Parameters

Ant Factor

ResNeXt

0.936

0.984

1.46M

4.28

ParticleNet

0.938

0.985

498k

13.4

PFN

0.932

0.982

82k

67.8

EFP

0.932

0.980

1k

5000

LGN

0.929

0.964

4.5k

617

VecNet.1

0.935

0.984

633k

9.87

VecNet.2

0.931

0.981

15k

350

PELICAN

0.943

0.987

45k

171

LorentzNet

0.942

0.9868

220k

35

N Hidden

AUC

Parameters

Ant Factor

EuclidNet

8

0.9913

967

11887

InteractionNet

8

0.9849

1432

4625

EuclidNet

16

0.9932

2580

5700

InteractionNet

16

0.9932

4392

3348

EuclidNet

32

0.9941

4448

3811

InteractionNet

32

0.9978

6448

7049

Ant factor = 10^5/[(1-AUC)*N_p]

Tracking

Tagging

12 of 18

Evaluating Equivariance

  • Jet tagging: equivariant models generalize, but not all to the same extent
  • Tracking: both equivariant and sufficiently large non-equivariant models generalize
  • Overall, equivariance provides a good amount of generalization, but other models can too (tradeoffs)

Generalizability

  • Jet tagging: clear benefit from equivariance in very small data regimes: achieves 99% of full accuracy with just 0.5% of training dataset
    • Compared to 97% for non-equivariant model
  • Overall, seems to be the most replicable benefit of equivariance. This is demonstrated in other papers, such as NequIP

Data Efficiency

Training %

Accuracy

AUC

LorentzNet

0.5%

0.932

0.9793

ParticleNet

0.5%

0.913

0.9687

LorentzNet

1%

0.932

0.9812

ParticleNet

1%

0.919

0.9734

LorentzNet

5%

0.937

0.9839

ParticleNet

5%

0.931

0.9839

Tagging

Tagging

Tracking

13 of 18

So, What Now?

What kinds of inductive biases are useful? How are they useful?

14 of 18

Over-constraint?

Is full equivariance the right approach for HEP tasks?

  • Unconstrained models can learn to generalize under symmetry transformations
  • VecNet studies show optimal accuracy and model efficiency are achieved with mixed equivariant and non-equivariant information
  • While the underlying physics is obeys symmetries, observed data is likely NOT fully symmetric

15 of 18

Expressivity?

Equivariance is a bit of an ill-defined model characteristic

  • Different methods of enforcing equivariance yield substantially different results
  • Not proven that it is easier to find the global optimum in a constrained optimization space
  • Highly performing HEP models have other design choices besides equivariance

Fundamental work on network expressivity indicates that feature choice and message construction largely determine expressivity

16 of 18

Transformers?

Transformers provide excellent performance with minimal or no inductive bias…

In a many parameter, big data regime does physics matter at all?

17 of 18

Three Takeaways

Consider the Physics Goals

Explore Other Inductive Biases

There is no one-size-fits-all solution to building the optimal model for a physics task. Trade-offs between model size, compute resources, data availability, robustness, etc are key

Consider inductive biases that modify data structures or task design, rather than constraining optimization space

Conduct apples-to-apples model comparisons that are better able to isolate the impact of design choices and carefully designed ablation studies

More Systematic Studies

18 of 18

What’s Next?

Let’s discuss!

st3565@columbia.edu

@basicsciencesav

CREDITS: This presentation template was created by Slidesgo, and includes icons by Flaticon, and infographics & images by Freepik