1 of 62

Machine Learning for QCD Studies

1

Vinicius M. Mikuni

vmikuni@lbl.gov

vinicius-mikuni

2 of 62

What I’m not talking about

2

PDF fits: see Felix and Timothy talks

3 of 62

What I’m not talking about: ML@QCD@LHC25

3

4 of 62

CERN

4

https://www.calcmaps.com/map-radius/

The Large Hadron Collider (LHC) is a 3 mile radius accelerator facility, accelerating particles near the speed of light

Dinner Location

5 of 62

The Challenge

5

More Likely to Happen

6 of 62

The Challenge

6

1 in 10 billion collisions at the LHC produce a Higgs Boson

Comparison:

  • Odds of winning the Powerball: 1 in 300 million
  • Odds of being killed by a vending machine: 1 in 112 million

Source: https://stacker.com/art-culture/odds-50-random-events-happening-you

7 of 62

The Challenge

7

Bunches of protons crossing every 25 ns, resulting in hundreds of millions of collisions per second

8 of 62

The Challenge

8

Source: CMS-NOTE-2022-008

Future

Present

Future upgrades of the LHC experiment will aim to increase the likelihood of collisions happening, exceeding the current computing budget

9 of 62

Generative models

Generative models are a class of algorithms trained to transform easy-to-sample noise into data

9

10 of 62

Diffusion Generative Models

10

See also:

1: E. Dreyer, E. Gross,, D. Kobylianskii, V. Mikuni, B.Nachman: e-Print: 2503.19981

2: M. Omana Kuttan,, K. Zhou, J. Steinheimer, H. Stöcker: e-Print: 2502.16330

3: Erik B., C. Ewen, D. A. Faroughy, et al.

e-Print: 2310.00049

4: A. Butter, N. Huetsch, S. P. Schweitzer, T. Plehn, P. Sorrenson et al. SciPost Phys.Core 8 (2025), 026, SciPost Phys.Core 8 (2025), 026

5: V. Mikuni, B. Nachman, M. Pettee: Phys.Rev.D 108 (2023) 3, 036025

6: M. Leigh, D. Sengupta, G. Quétant, J. A. Raine, K. Zoch et al. SciPost Phys. 16 (2024) 1, 018, SciPost Phys. 16 (2024), 018

11 of 62

Diffusion Generative Models

11

“Scientists at QCD@LHC working on Science and Machine Learning”

12 of 62

EIC Events

12

We generate SM events for the EIC using Pythia8

13 of 62

EIC Events

13

We generate SM events for the EIC using Pythia8

  • We need a suitable network to encode the full event information

Encode the electron separately from hadrons:

p(e,h) = p(h|e)p(e)

  • OmniLearn* model used for hadrons

*V. Mikuni, B. Nachman, Phys. Rev. D 111, L051504

14 of 62

EIC Events

14

2-step generation

  • Generate electron first and then hadrons

Generate multiple particle species from Pythia

  • Able to correctly generate the multiplicities

Araz, J. Y., Mikuni, V., Ringer, F., Sato, N., Acosta, F. T., Whitehill, R., Phys.Lett.B 868 (2025) 139694

15 of 62

EIC Events

15

Generate multiple particle species from Pythia

  • Able to correctly generate the kinematics

Araz, J. Y., Mikuni, V., Ringer, F., Sato, N., Acosta, F. T., Whitehill, R., Phys.Lett.B 868 (2025) 139694

2-step generation

  • Generate electron first and then hadrons

16 of 62

EIC Events

16

Generate multiple particle species from Pythia

  • Able to learn conservation rules

Ratio between Pythia and Diffusion model

Araz, J. Y., Mikuni, V., Ringer, F., Sato, N., Acosta, F. T., Whitehill, R., Phys.Lett.B 868 (2025) 139694

2-step generation

  • Generate electron first and then hadrons

17 of 62

Future

17

Theory parameters

𝛳

Physics Prediction zp

zp~p(zp|𝛳)

Generative Models are also differentiable by design:

  • Learn how events change based on model parameters
  • Tune all parameters based on observed data

Given observed data zd maximize L(zd|𝛳) wrt. 𝛳

18 of 62

Event Unfolding

18

19 of 62

Unfolding

19

What we measure

What we want

20 of 62

Unfolding

20

A. Badea, A. Baty, H. Bossi, et al. arXiv:2507.14349

21 of 62

Unfolding

21

How to define the optimal binning?

  • Choice depends on the distribution and phase space
  • Need to compromise when combining results from different experiments

22 of 62

Unfolding

22

How to include multiple distributions?

  • Histograms are hard to scale: curse of dimensionality
  • Unfolding uncertainties can be reduced using additional observables

How to define the optimal binning?

  • Choice depends on the distribution and phase space
  • Need to compromise when combining results from different experiments

23 of 62

Unfolding

23

How to unfold distributions that are not defined for each event?

  • Moments of distributions
  • Energy Correlators

How to include multiple distributions?

  • Histograms are hard to scale: curse of dimensionality
  • Unfolding uncertainties can be reduced using additional observables

How to define the optimal binning?

  • Choice depends on the distribution and phase space
  • Need to compromise when combining results from different experiments

24 of 62

ML Based Unfolding

24

2-step iterative process

  • Step 1: Reweight simulations to look like data
  • Step 2: Convert learned weights into functions of particle level objects
  • Use classifiers to learn the reweighting functions!

Source: Andreassen et al. PRL 124, 182001 (2020)

25 of 62

ML Based Unfolding

25

See also

1: M. Backes, A. Butter, M. Dunford, B. Malaescu: SciPost Phys.Core 7 (2024) 1, 007

2: A. Shmakov, K. T. Greif, M. J. Fenton, A. Ghosh, P. Baldi et al. SciPost Phys. 18 (2025) 4, 117, SciPost Phys. 18 (2025), 117

3: N. Huetsch, J. M. Villadamigo, A. Shmakov, S. Diefenbacher, V. Mikuni et al. SciPost Phys. 18 (2025) 2, 070, SciPost Phys. 18 (2025), 070

4: A. Butter, S. Diefenbacher, N. Huetsch, V. Mikuni, B. Nachman et al.: SciPost Phys. 18 (2025) 6, 200, SciPost Phys. 18 (2025), 200

5: M. Bellagente, A. Butter, G. Kasieczka, T. Plehn, A. Rousselot et al.

SciPost Phys. 9 (2020), 074

6: C. Pazos, S. Aeron, P.-H. Beauchemin, V. Croft, Z. Huan et al.

e-Print: 2406.01507

7: S. Diefenbacher, G.-H. Liu, V. Mikuni, B. Nachman, W. Nie: Phys.Rev.D 109 (2024) 7, 076011

Source: Andreassen et al. PRL 124, 182001 (2020)

26 of 62

The H1 Detector

26

One of the two multipurpose detectors at the HERA accelerator facility

  • Data taking from 1992 to 2007 colliding electrons/positrons against protons
  • Huge data preservation effort to modernize the software and preserve the data

27 of 62

ML Based Unfolding

27

3 papers on ML-based unfolding using H1 data

28 of 62

Azimuthal Asymmetries

28

Study of correlations between the scattered lepton and jet

Phys.Rev.Lett. 128 (2022) 13, 132002

29 of 62

Azimuthal Asymmetries

Final state lepton and jet are mostly back-to-back

  • Imbalance can arise from perturbative initial/final state radiation
  • Target the region dominated by soft gluon emissions: P>> q
  • Provide information for TMD PDF measurements where the soft gluon contribution can be factorized

29

  • kl⟂: transverse momentum of the scattered lepton
  • kJ⟂: transverse momentum of the jet

Measure: cos(ɸ), cos(2ɸ), cos(3ɸ)

Require q / P> 0.3

30 of 62

Azimuthal Asymmetries

Reuse previous results at PRL. 128, 132002

  • Quantities previously unfolded

30

31 of 62

Results

31

Dedicated DIS generators do a good job everywhere, especially Rapgap

Pythia predictions not tuned to this data

GBW Includes gluon saturation effects while CT18A uses NLO TMD calculations with collinear PDFs, both currently available only for low q

arXiv:2412.14092, submitted to PLB

32 of 62

What if we unfold everything?

32

33 of 62

Experimental setup

Using 228 pb-1 of data collected by the H1 Experiment during 2006 and 2007 at 318 GeV center-of-mass energy

33

Reconstructed hadrons using combined detector information: energy flow algorithm

27.5 GeV e+- (k)

920 GeV p (P)

Q2 = - q2

y = Pq / pk

P: incoming proton 4-vector

k: incoming electron 4-vector

q=k-k’ : 4-momentum transfer

Goal: Include the information of all reconstructed particles + scattered lepton in the collision

34 of 62

OmniLearn

34

We use the OmniLearn model to train the classifiers for the unfolding task:

  • Same Model used for the generation of EIC events, but now focused on classification

More details at: V. Mikuni, B. Nachman, Phys. Rev. D 111, L051504

35 of 62

Results

Cluster unfolded jets using kT algorithm with radius of 1.0

We are able to re-derive past results

35

Phys. Rev. Lett.(128) 132002

36 of 62

Results

Cluster unfolded jets using kT algorithm with radius of 1.0

We are able to re-derive past results

36

Phys.Lett.B 844 (2023) 138101

37 of 62

Results

Breit Frame provides a natural frame to study ep collisions, where the struck quark forms a jet opposite from the proton beam: useful for jet and TMD studies

  • Starting from the Lab frame, we need to boost the system: not trivial in terms of unfolding

37

38 of 62

Results

Cluster jets using kT algorithm with radius of 1.0

We can study observables in different frames!

38

Lab Frame

Breit Frame

39 of 62

Results

Unfold observables that are hard to unfold without machine learning: Energy Energy Correlators

39

Sensitive to transverse momentum dependent parton distribution functions and fragmentation functions

Eq. from Phys.Rev.D 103 (2021) 9, 094005

See also the talks from:

Simon, Ian, Jingyu

40 of 62

OmniLearn

Combine tasks: Train one model to classify and generate particles

  • Use transformers and graph neural networks to learn the representation of particle interactions: OmniLearn

40

More details at: V. Mikuni, B. Nachman, Phys. Rev. D 111, L051504

41 of 62

Strategy

41

Model starts with random weights

Ask the model to classify and generate particle collisions

Fine-tune the model on new datasets and tasks

Jet Tagging

Unfolding

Anomaly Detection

42 of 62

Jet Tagging

42

Source: ATL-PHYS-SLIDE-2023-048

Pushing classification performance requires lots of simulated data!

30M Jets

192M Jets

43 of 62

FastSim to FullSim

43

OmniLearn is trained on fast simulations. Fine-tune to ATLAS Top Tagging Open Data Set

Full simulation + Reconstruction

  • Matches ATLAS with 10% of the data

44 of 62

Improving Unfolding

44

Improved precision for unfolding

  • Training time reduced by a factor 2!

45 of 62

Conclusions

45

  • Machine learning opens the path to new opportunities in collider physics
  • Full generation of particle collisions: fast and comprehensive parameter tuning
  • Full event unfolding: even future observables are unfolded
  • Growing venues to present AI advances for HEP: AI4EIC, ML4Jets, ML4PS

46 of 62

THANKS!

Any questions?

46

47 of 62

Backup

47

48 of 62

EIC Events

48

We also compare with previous diffusion model based on images

  • Using the particles directly improve the quality and avoid the need for pixelization

53: P. Devlin, J.-W. Qiu, F. Ringer, and N. Sato, Phys. Rev. D 110 no. 1, (2024) 016030,

Araz, J. Y., Mikuni, V., Ringer, F., Sato, N., Acosta, F. T., Whitehill, R., Phys.Lett.B 868 (2025) 139694

49 of 62

Systematic uncertainties

Systematic uncertainties

  • HFS energy scale: +- 1%
  • HFS azimuthal angle: +- 20 mrad
  • Lepton energy: +- 0.5%
  • Lepton azimuthal angle: +- 1 mrad
  • Model uncertainty: differences in unfolded results between Djangoh and Rapgap
  • Non-closure uncertainty: Differences between the expected and obtained values of the closure test

49

Unfolding uncertainties

50 of 62

MC Generators

Lund string hadronization model and CTEQ6L PDF set

  • Djangoh: LO neutral DIS with dipole model from Ariadne
  • Rapgap: LO DIS with PS from leading log approximation

Pythia 8.3: default NNPDF3.1 PDF

  • Vincia: pT ordered antenna and NNPDF3.1 PDF
  • Dire: dipole model, similar to Ariadne and MMHT14nlo68cl PDF

Herwig 7.2: Cluster hadronization and CT14 PDF set

50

51 of 62

Phi dependence

51

52 of 62

Experimental setup

Experimental setup

Fiducial Phase space definition:

  • 0.2 < y < 0.7
  • Q2 > 150 GeV2

Particle selection:

  • pT > 0.1 GeV
  • -1 < 𝜂lab < 2.75
  • Charge information used if 𝜂lab < 2

Reco Phase space definition:

  • 0.08 < y < 0.7
  • Q2 > 150 GeV2
  • pT miss < 10 GeV,
  • 45 < em/pz < 65

Particle selection:

  • pT > 0.1 GeV
  • -1 < 𝜂lab < 2.75

  • Pass reco selection: Red -> Orange: 77%
  • Pass fiducial selection: Red -> Blue: 58%
  • Pass fiducial and reco selection: Blue -> Orange: 96%
  • Don’t pass fiducial but pass reco: Red -> Orange (without blue): 50%

52

Q2 > 100 GeV2

53 of 62

Closure test

  • Use Djangoh as the pseudo-data and unfold Rapgap

53

  • Features used during the unfolding:
    • Kinematic information of all hadrons and scattered lepton

54 of 62

Pretraining

54

We would like to unfold up to 130*3 = 390 features simultaneously: requires lots of data

  • Our data size is around 500k events, but we have 20M simulations for 2 different simulators
  • Idea: Pretrain a model using only simulations and then fine-tune this model with data
  • Use this model as the starting point for the rest of all trainings needed for the unfolding

55 of 62

Generative models

55

Training Stability

Scalability

Fast inference

Fidelity

Expressivity

Diffusion

Yes

Yes

No

Yes

Yes

GANS

No

Yes

Yes

Maybe

Yes

VAE

Maybe

Yes

Yes

Maybe

Kinda

NF

Yes

Maybe

Maybe

Yes

Kinda

  • GANS:
    • Modern GAN architectures haven’t really been explored in HEP, mostly the vanilla ones with ok results
  • VAE:
    • KL Divergence can behave poorly when generator output changes too fast during training, often needs regularization.
    • Reconstruction loss is often taken as MSE, which learns only averages and makes sharp distributions blurry. For images there are other tailored losses that improve this behaviour
  • NF:
    • Since the transformation needs to be invertible, bottleneck layers cannot be used, requiring very large networks for even small problems. Can still be improved by splitting into multiple smaller networks
    • Autoregressive flows are one of the best density estimators but alone are very slow either to train or to sample (O(d^2) in the slowest direction), but can still be overcome with distillation models

56 of 62

Score matching/denoising/diffusion

Denoise diffusion models are the newest state-of-the-art generative models for image generation.

Pros:

  • Stable training: convex loss function
  • Scalability: Network complexity is more sensitive to the architecture than the dimensionality
  • Access to data likelihood after training: similar to NFs, but overall normalization is not required during training

Cons:

  • Slow sampling: Possibly 1000s of model evaluations to generate realistic images

56

57 of 62

Score-matching

57

  • The common choice for 𝜆(t) is 𝛔(t)2 resulting in the loss function
  • Another important result is when 𝜆(t) is g(t)2 that represents an upper bound of the data likelihood
  • Allowing the maximum-likelihood training of diffusion models!

58 of 62

Likelihood estimation?

  • Data generation can also be achieved by solving the associated ODE
    • Often leads to worse samples compared to Langevin dynamics generation
  • On the other hand, we can also use the deterministic ODE recover the data density!

58

SDE

ODE

59 of 62

Introduction

59

60 of 62

Anomaly Detection

60

Bump-hunting using ML:

  • Use the background in the sideband to estimate the background in the signal region
  • Compare the estimated background with the data

61 of 62

Anomaly Detection

61

Bump-hunting using ML:

  • Generative Model
  • Classifier

62 of 62

Anomaly Detection

62

  • Generate the full dijet system
  • Classify data from background

OmniLearn requires 4 times less data to identify anomalies!