1 of 43

OmniLearn: Facilitating All Jet Physics Tasks

1

Vinicius M. Mikuni

vmikuni@lbl.gov

vinicius-mikuni

2 of 43

Foundational Models

2

Option 1: Human language is the communication medium between the user and the machine

  • Possibility of exchanging ideas and interpolating human knowledge

3 of 43

Foundational Models

3

Option 1: Human language is the communication medium between the user and the machine

  • Possibility of exchanging ideas and interpolating human knowledge
  • Requires strong reasoning capabilities to understand the connection between disciplines

4 of 43

Foundational Models

4

Option 1: Human language is the communication medium between the user and the machine

  • Possibility of exchanging ideas and interpolating human knowledge
  • Requires strong reasoning capabilities to understand the connection between disciplines
  • Large models required for generality

5 of 43

Foundational Models

5

Option 2: Data is the communication medium between the user and the machine

  • Data is ingested as is, allowing the model to learn complicated correlations

6 of 43

Foundational Models

6

Option 2: Data is the communication medium between the user and the machine

  • Data is ingested as is, allowing the model to learn complicated correlations
  • Requires domain knowledge: what to represent? How?

7 of 43

Foundational Models

7

Option 2: Data is the communication medium between the user and the machine

  • Data is ingested as is, allowing the model to learn complicated correlations
  • Requires domain knowledge: what to represent? How?
  • Smaller (O(1M)) models can be used instead given domain knowledge

8 of 43

Jets

8

Jets are the most common signatures at the LHC

  • Complicated signature: O(10-100) objects are clustered in each jet
  • Choice of data: Particle Flow objects associated to jets

  • Choice of data representation: Point Clouds

9 of 43

Jets

9

Measurements

Searches

Tagging

10 of 43

Jets

How to teach AI about jets?

10

11 of 43

Encoding jet information

Create a neural network model that aims to accomplish 2 tasks:

  • Classify jets: learns the difference in radiation between jet types
  • Generate jets: implicitly learn the likelihood of jets for different partons

11

12 of 43

Diffusion 101

Diffusion models are the go to for data generation

  • Simple training: take data x, perturb with a Gaussian of mean 𝜇 and std 𝜎
  • x’ = 𝜇*x + 𝜎*𝜺, 𝜺~N(0,1)
  • Ask the network to predict the noise injected
  • L = ||D(x’) - 𝜺||2

12

13 of 43

Encoding jet information

Create a neural network model that aims to accomplish 2 tasks:

  • Classify jets: learns the difference in radiation between jet types
  • Generate jets: implicitly learn the likelihood of jets for different partons
  • Loss = CE(x) + ||D(x’) - 𝜺||2 + 𝜇2CE(x’)
  • Data augmentation for free!

13

14 of 43

Encoding jet information

Point-Edge Transformer (PET)

  • Combine local information with graphs
  • Learn global information with Transformers: 3M parameters

14

+

15 of 43

Input Dropout

Not all datasets contain the same information:

  • Let the model learn with and without some features
  • Feature Dropout: With fixed probability, set some of the input features to 0

15

f1, f2, f3, f4, f5, f6, f7, f8

f5, f6, f7, f8

0,0,0,0

p = 0.9

p = 0.1

16 of 43

Comparison Between Models

Language inspired models

  • Data are tokenized
  • Unsupervised and general pre-training
  • Big models often required

OmniLearn

  • Data are continuous
  • HEP has one of the best simulators across all sciences: supervised pre-training
  • Medium models that can fit on standard GPUs are still useful

16

17 of 43

Training

JetClass dataset used for training

  • 100M jets
  • 10 different jet categories, AK8 jets simulated in pp collisions with Madgraph + Pythia8 with CMS Delphes detector simulation

Use the pre-trained model as the starting point and fine-tune using different datasets

17

Huilin Qu, Congqiao Li, Sitian Qian, arXiv:2202.03772

18 of 43

Evaluation

2 different jet categories, AK8 jets simulated in pp collisions with Madgraph + Pythia8 with ATLAS Delphes detector simulation

18

Better than all non-fine-tuned models and similar to PartT performance

Evaluation datasets: 1

19 of 43

Evaluation

2 different jet categories, AK4 jets simulated in pp collisions with Madgraph + Pythia8 with CMS Delphes detector simulation

19

Better than all non-fine-tuned models and similar to PartT performance

Evaluation datasets: 2

20 of 43

Evaluation

20

Evaluation datasets: 2

Faster training and better convergence

21 of 43

Evaluation

2 different jet categories, AK5 jets simulated in pp collisions with Pythia6 with Geant4 Simulation + CMS Particle flow reconstruction

21

Evaluation datasets: 3

22 of 43

Evaluation

2 different jet categories, AK10 jets simulated in ep collisions with Rapgap with Geant3 Simulation + H1 Particle flow reconstruction

22

Evaluation datasets: 4

23 of 43

Jet Generation

23

Evaluation datasets: 6

Great generation quality across multiple metrics

24 of 43

Application Highlight

24

25 of 43

FastSim to FullSim

25

Evaluation datasets: 7

OmniLearn is trained on cheap Delphes simulations. Can we fine-tune to Run 2 ATLAS Full simulation + Reconstruction?

  • Matches SOTA with 10% of the data
  • Improves on SOTA if all events are used

26 of 43

Unfolding

26

What we measure

What we want

27 of 43

OmniFold

27

Source: Andreassen et al. PRL 124, 182001 (2020)

2-step iterative process

  • Step 1: Reweight simulations to look like data
  • Step 2: Convert learned weights into functions of particle level objects

28 of 43

ATLAS OmniFold analysis

28

OmniFold dataset consisting of Z(𝜈𝜈) + Jets events. Unfold the particles directly and then build the jet observables

Evaluation datasets: 8

29 of 43

Unfolding

29

Evaluation datasets: 8

Unbinned Unfolding using the OmniFold workflow. More precise than traditional unfolding and more efficient than previous ML models

30 of 43

Anomaly Detection

30

Evaluation datasets: 9

Bump-hunting using ML:

  • Use the background in the sideband to estimate the background in the signal region
  • Compare the estimated background with the data

31 of 43

Anomaly Detection

31

Evaluation datasets: 9

Bump-hunting using ML:

  • Generative Model
  • Classifier

32 of 43

LHCO dataset

32

LHCO R&D dataset

  • Resonant dijet final state: A->B(qq)C(qq) with mA, mB , mC = 3.5, 0.5, 0.1 TeV

Evaluation datasets: 9

33 of 43

Anomaly Detection

33

Evaluation datasets: 9

  • Generate the full dijet system: 2*279*3 = 1674 numbers to generate
  • Classify data from background

SIC = Significance Improvement Curve (TPR/sqrt(FPR) vs TPR) “By how much can I improve the significance of a particular signal given an initial significance.”

34 of 43

Anomaly Detection

34

Evaluation datasets: 9

  • Generate the full dijet system: 2*279*3 = 1674 numbers to generate
  • Classify data from background

Previous results were limited by the amount of data in the SR: Only sensitive to NP when S/B > 3% ~ 4𝜎

OmniLearn founds the NP with S/B = 0.7% ~ 2𝜎

35 of 43

Conclusion

35

  • OmniLearn: learn a general representation of jets
  • Evaluate OmniLearn across 9 different downstream datasets
  • Evaluate the performance on jet tagging, jet generation, unfolding, and anomaly detection
  • OmniLearn improves upon SOTA or/and converges quicker than models trained from scratch
  • Magnify the statistical power of the data: Not only Big Data benefits from AI
  • Try it out yourself: https://github.com/ViniciusMikuni/OmniLearn/ and check out the paper: arXiv:2404.16091

36 of 43

THANKS!

Any questions?

36

37 of 43

Backup

37

38 of 43

ATLAS Loss Curves

38

39 of 43

OmniLearn for reweighting

39

40 of 43

OmniLearn for Unfolding

40

41 of 43

PET

Train one model that learns to classify and generate jets

  • Combine both local and global information using local edges and a transformer: Point-Edge Transformer

41

42 of 43

Diffusion Generative Models

42

43 of 43

Loss function

Straightforward loss function:

  • Cross entropy for each class
  • Perturbed data prediction from the diffusion loss
  • Classification over perturbed inputs: data augmentation!

43