1 of 62

Machine Learning for QCD Studies

1

Vinicius M. Mikuni

vmikuni@lbl.gov

vinicius-mikuni

2 of 62

What I’m not talking about

2

Flavour Tagging: see Reinhard’s talk

PDF fits: see Felix and Timothy talks

Parameter estimation and UQ see Brandon’s talk

Lattice QCD

Matrix element calculations

3 of 62

What I’m not talking about: ML@QCD@LHC25

3

4 of 62

CERN

4

https://www.calcmaps.com/map-radius/

The Large Hadron Collider (LHC) is a 3 mile radius accelerator facility, accelerating particles near the speed of light

Dinner Location

5 of 62

The Challenge

5

More Likely to Happen

6 of 62

The Challenge

6

1 in 10 billion collisions at the LHC produce a Higgs Boson

Comparison:

Odds of winning the Powerball: 1 in 300 million
Odds of being killed by a vending machine: 1 in 112 million

Source: https://stacker.com/art-culture/odds-50-random-events-happening-you

7 of 62

The Challenge

7

Bunches of protons crossing every 25 ns, resulting in hundreds of millions of collisions per second

8 of 62

The Challenge

8

Source: CMS-NOTE-2022-008

Future

Present

Future upgrades of the LHC experiment will aim to increase the likelihood of collisions happening, exceeding the current computing budget

9 of 62

Generative models

Generative models are a class of algorithms trained to transform easy-to-sample noise into data

9

Source: https://yang-song.net/blog/2021/score/

10 of 62

Diffusion Generative Models

10

Source: https://yang-song.net/blog/2021/score/

11 of 62

Diffusion Generative Models

11

“Scientists at QCD@LHC working on Science and Machine Learning”

12 of 62

EIC Events

12

We generate SM events for the EIC using Pythia8

13 of 62

EIC Events

13

We generate SM events for the EIC using Pythia8

We need a suitable network to encode the full event information

Encode the electron separately from hadrons:

p(e,h) = p(h|e)p(e)

OmniLearn* model used for hadrons

*V. Mikuni, B. Nachman, Phys. Rev. D 111, L051504

14 of 62

EIC Events

14

2-step generation

Generate electron first and then hadrons

Generate multiple particle species from Pythia

Able to correctly generate the multiplicities

Araz, J. Y., Mikuni, V., Ringer, F., Sato, N., Acosta, F. T., Whitehill, R., Phys.Lett.B 868 (2025) 139694

15 of 62

EIC Events

15

Generate multiple particle species from Pythia

Able to correctly generate the kinematics

Araz, J. Y., Mikuni, V., Ringer, F., Sato, N., Acosta, F. T., Whitehill, R., Phys.Lett.B 868 (2025) 139694

2-step generation

Generate electron first and then hadrons

16 of 62

EIC Events

16

Generate multiple particle species from Pythia

Able to learn conservation rules

Ratio between Pythia and Diffusion model

Araz, J. Y., Mikuni, V., Ringer, F., Sato, N., Acosta, F. T., Whitehill, R., Phys.Lett.B 868 (2025) 139694

2-step generation

Generate electron first and then hadrons

17 of 62

Future

17

Theory parameters

𝛳

Physics Prediction z_p

z_p~p(z_p|𝛳)

Generative Models are also differentiable by design:

Learn how events change based on model parameters
Tune all parameters based on observed data

Given observed data z_dmaximize L(z_d|𝛳) wrt. 𝛳

18 of 62

Event Unfolding

18

19 of 62

Unfolding

19

What we measure

What we want

20 of 62

Unfolding

20

See Hannah’s Talk

A. Badea, A. Baty, H. Bossi, et al. arXiv:2507.14349

21 of 62

Unfolding

21

How to define the optimal binning?

Choice depends on the distribution and phase space
Need to compromise when combining results from different experiments

22 of 62

Unfolding

22

How to include multiple distributions?

Histograms are hard to scale: curse of dimensionality
Unfolding uncertainties can be reduced using additional observables

How to define the optimal binning?

Choice depends on the distribution and phase space
Need to compromise when combining results from different experiments

23 of 62

Unfolding

23

How to unfold distributions that are not defined for each event?

Moments of distributions
Energy Correlators

How to include multiple distributions?

Histograms are hard to scale: curse of dimensionality
Unfolding uncertainties can be reduced using additional observables

How to define the optimal binning?

Choice depends on the distribution and phase space
Need to compromise when combining results from different experiments

24 of 62

ML Based Unfolding

24

2-step iterative process

Step 1: Reweight simulations to look like data
Step 2: Convert learned weights into functions of particle level objects
Use classifiers to learn the reweighting functions!

Source: Andreassen et al. PRL 124, 182001 (2020)

25 of 62

ML Based Unfolding

25

26 of 62

The H1 Detector

26

One of the two multipurpose detectors at the HERA accelerator facility

Data taking from 1992 to 2007 colliding electrons/positrons against protons
Huge data preservation effort to modernize the software and preserve the data

27 of 62

ML Based Unfolding

27

3 papers on ML-based unfolding using H1 data

28 of 62

Azimuthal Asymmetries

28

Study of correlations between the scattered lepton and jet

Phys.Rev.Lett. 128 (2022) 13, 132002

29 of 62

Azimuthal Asymmetries

Final state lepton and jet are mostly back-to-back

Imbalance can arise from perturbative initial/final state radiation
Target the region dominated by soft gluon emissions: P_⟂>> q_⟂
Provide information for TMD PDF measurements where the soft gluon contribution can be factorized

29

k_l⟂: transverse momentum of the scattered lepton
k_J⟂: transverse momentum of the jet

Measure: cos(ɸ), cos(2ɸ), cos(3ɸ)

Require q_⟂ / P_⟂> 0.3

30 of 62

Azimuthal Asymmetries

Reuse previous results at PRL. 128, 132002

Quantities previously unfolded

30

31 of 62

Results

31

Dedicated DIS generators do a good job everywhere, especially Rapgap

Pythia predictions not tuned to this data

GBW Includes gluon saturation effects while CT18A uses NLO TMD calculations with collinear PDFs, both currently available only for low q_⟂

arXiv:2412.14092, submitted to PLB

32 of 62

What if we unfold everything?

32

33 of 62

Experimental setup

Using 228 pb^-1 of data collected by the H1 Experiment during 2006 and 2007 at 318 GeV center-of-mass energy

33

Reconstructed hadrons using combined detector information: energy flow algorithm

27.5 GeV e^+- (k)

920 GeV p (P)

Q² = - q²

y = Pq / pk

P: incoming proton 4-vector

k: incoming electron 4-vector

q=k-k’ : 4-momentum transfer

Goal: Include the information of all reconstructed particles + scattered lepton in the collision

34 of 62

OmniLearn

34

We use the OmniLearn model to train the classifiers for the unfolding task:

Same Model used for the generation of EIC events, but now focused on classification

More details at: V. Mikuni, B. Nachman, Phys. Rev. D 111, L051504

35 of 62

Results

Cluster unfolded jets using kT algorithm with radius of 1.0

We are able to re-derive past results

35

Phys. Rev. Lett.(128) 132002

H1 Preliminary note

36 of 62

Results

Cluster unfolded jets using kT algorithm with radius of 1.0

We are able to re-derive past results

36

Phys.Lett.B 844 (2023) 138101

H1 Preliminary note

37 of 62

Results

Breit Frame provides a natural frame to study ep collisions, where the struck quark forms a jet opposite from the proton beam: useful for jet and TMD studies

Starting from the Lab frame, we need to boost the system: not trivial in terms of unfolding

37

38 of 62

Results

Cluster jets using kT algorithm with radius of 1.0

We can study observables in different frames!

38

Lab Frame

Breit Frame

H1 Preliminary note

39 of 62

Results

Unfold observables that are hard to unfold without machine learning: Energy Energy Correlators

39

Sensitive to transverse momentum dependent parton distribution functions and fragmentation functions

Eq. from Phys.Rev.D 103 (2021) 9, 094005

40 of 62

OmniLearn

Combine tasks: Train one model to classify and generate particles

Use transformers and graph neural networks to learn the representation of particle interactions: OmniLearn

40

More details at: V. Mikuni, B. Nachman, Phys. Rev. D 111, L051504

41 of 62

Strategy

41

Model starts with random weights

Ask the model to classify and generate particle collisions

Fine-tune the model on new datasets and tasks

Jet Tagging

Unfolding

Anomaly Detection

42 of 62

Jet Tagging

42

Source: ATL-PHYS-SLIDE-2023-048

Pushing classification performance requires lots of simulated data!

30M Jets

192M Jets

43 of 62

FastSim to FullSim

43

OmniLearn is trained on fast simulations. Fine-tune to ATLAS Top Tagging Open Data Set

Full simulation + Reconstruction

Matches ATLAS with 10% of the data

44 of 62

Improving Unfolding

44

Improved precision for unfolding

Training time reduced by a factor 2!

45 of 62

Conclusions

45

Machine learning opens the path to new opportunities in collider physics
Full generation of particle collisions: fast and comprehensive parameter tuning
Full event unfolding: even future observables are unfolded
Growing venues to present AI advances for HEP: AI4EIC, ML4Jets, ML4PS

46 of 62

THANKS!

Any questions?

46

47 of 62

Backup

47

48 of 62

EIC Events

48

We also compare with previous diffusion model based on images

Using the particles directly improve the quality and avoid the need for pixelization

53: P. Devlin, J.-W. Qiu, F. Ringer, and N. Sato, Phys. Rev. D 110 no. 1, (2024) 016030,

Araz, J. Y., Mikuni, V., Ringer, F., Sato, N., Acosta, F. T., Whitehill, R., Phys.Lett.B 868 (2025) 139694

49 of 62

Systematic uncertainties

HFS energy scale: +- 1%
HFS azimuthal angle: +- 20 mrad
Lepton energy: +- 0.5%
Lepton azimuthal angle: +- 1 mrad
Model uncertainty: differences in unfolded results between Djangoh and Rapgap
Non-closure uncertainty: Differences between the expected and obtained values of the closure test

49

Unfolding uncertainties

50 of 62

MC Generators

Lund string hadronization model and CTEQ6L PDF set

Djangoh: LO neutral DIS with dipole model from Ariadne
Rapgap: LO DIS with PS from leading log approximation

Pythia 8.3: default NNPDF3.1 PDF

Vincia: p_T ordered antenna and NNPDF3.1 PDF
Dire: dipole model, similar to Ariadne and MMHT14nlo68cl PDF

Herwig 7.2: Cluster hadronization and CT14 PDF set

50

51 of 62

Phi dependence

51

See more in this talk given by Fernando Torales

52 of 62

Experimental setup

Fiducial Phase space definition:

0.2 < y < 0.7
Q² > 150 GeV²

Particle selection:

p_T > 0.1 GeV
-1 < 𝜂_lab < 2.75
Charge information used if 𝜂_lab< 2

Reco Phase space definition:

0.08 < y < 0.7
Q² > 150 GeV²
p_T miss < 10 GeV,
45 < em/p_z < 65

Particle selection:

p_T > 0.1 GeV
-1 < 𝜂_lab < 2.75

Pass reco selection: Red -> Orange: 77%
Pass fiducial selection: Red -> Blue: 58%
Pass fiducial and reco selection: Blue -> Orange: 96%
Don’t pass fiducial but pass reco: Red -> Orange (without blue): 50%

52

Q² > 100 GeV²

53 of 62

Closure test

Use Djangoh as the pseudo-data and unfold Rapgap

53

Features used during the unfolding:

Kinematic information of all hadrons and scattered lepton

54 of 62

Pretraining

54

We would like to unfold up to 130*3 = 390 features simultaneously: requires lots of data

Our data size is around 500k events, but we have 20M simulations for 2 different simulators
Idea: Pretrain a model using only simulations and then fine-tune this model with data
Use this model as the starting point for the rest of all trainings needed for the unfolding

55 of 62

Generative models

55

	Training Stability	Scalability	Fast inference	Fidelity	Expressivity
Diffusion	Yes	Yes	No	Yes	Yes
GANS	No	Yes	Yes	Maybe	Yes
VAE	Maybe	Yes	Yes	Maybe	Kinda
NF	Yes	Maybe	Maybe	Yes	Kinda

GANS:

Modern GAN architectures haven’t really been explored in HEP, mostly the vanilla ones with ok results

VAE:

KL Divergence can behave poorly when generator output changes too fast during training, often needs regularization.
Reconstruction loss is often taken as MSE, which learns only averages and makes sharp distributions blurry. For images there are other tailored losses that improve this behaviour

NF:

Since the transformation needs to be invertible, bottleneck layers cannot be used, requiring very large networks for even small problems. Can still be improved by splitting into multiple smaller networks
Autoregressive flows are one of the best density estimators but alone are very slow either to train or to sample (O(d^2) in the slowest direction), but can still be overcome with distillation models

56 of 62

Score matching/denoising/diffusion

Denoise diffusion models are the newest state-of-the-art generative models for image generation.

Pros:

Stable training: convex loss function
Scalability: Network complexity is more sensitive to the architecture than the dimensionality
Access to data likelihood after training: similar to NFs, but overall normalization is not required during training

Cons:

Slow sampling: Possibly 1000s of model evaluations to generate realistic images

56

57 of 62

Score-matching

57

The common choice for 𝜆(t) is 𝛔(t)² resulting in the loss function

Another important result is when 𝜆(t) is g(t)² that represents an upper bound of the data likelihood

Allowing the maximum-likelihood training of diffusion models!

58 of 62

Likelihood estimation?

Data generation can also be achieved by solving the associated ODE

Often leads to worse samples compared to Langevin dynamics generation

On the other hand, we can also use the deterministic ODE recover the data density!

58

SDE

ODE

59 of 62

Introduction

59

https://www.symmetrymagazine.org/article/december-2013/four-things-you-might-not-know-about-dark-matter

60 of 62

Anomaly Detection

60

Bump-hunting using ML:

Use the background in the sideband to estimate the background in the signal region
Compare the estimated background with the data

61 of 62

Anomaly Detection

61

Bump-hunting using ML:

Generative Model
Classifier

62 of 62

Anomaly Detection

62

Generate the full dijet system
Classify data from background

OmniLearn requires 4 times less data to identify anomalies!