1 of 23

AI/ML Techniques Overview in Neutrino Physics

Patrick de Perio (Kavli IPMU)

Kazu Terao (SLAC)

FAIRS-Japan @ KMI

2 of 23

AI/ML Applications in Neutrino Physics

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 2

Reconstruction
Surrogate Models
Simulation
Domain Adaptation

Past workshops on Neutrino Physics Machine Learning (NPML):� 2020 (Remote), 2023 (Tufts), 2024 (Zurich)

3 of 23

Data Reconstruction and Analysis

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 3

image height

image width

image depth

features

repeat

...

P(μ^±)

P(e^±)

P(π⁰)

P(γ)

2. Convolutions & down-samples

3. Fully connected neural network

“Softmax discriminators”

Many applications of Convolutional/Graph Neural Networks

Supervised learning using truth information from high fidelity simulation
Input data pre-processed into an image or a graph format

Can come with bias/loss of information, or significant computation

Softmax P(γ)

ν_e CC0π

NC γ

NC π⁰

Distance to detector wall (cm)

e/μ identification

Phys. Sci. Forum 2023 , 8(1), 63

1. Pre-processing

4 of 23

Data Reconstruction and Analysis

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 4

Many applications of Convolutional/Graph Neural Networks

Supervised learning using truth information from high fidelity simulation
Input data pre-processed into an image or a graph format

Can come with bias/loss of information, or significant computation

Models that exploit geometrical symmetries (invariance/equivariance)

Spherical CNN and KamNet in KamLAND

E3NN: Euclidean Neural Nets and on-going application to LArTPC

Euclidean (3) equivariant neural network model for translation, rotation, and mirroring

5 of 23

Data Reconstruction and Analysis

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 5

Many applications of Convolutional/Graph Neural Networks

Supervised learning using truth information from high fidelity simulation
Input data pre-processed into an image or a graph format

Can come with bias/loss of information, or significant computation

Models that exploit geometrical symmetries (invariance/equivariance)
Composite models for end-to-end object reconstruction

Phys. Rev. D 104 , 032004

Phys. Rev. D 102 , 012005

Phys. Rev. D 104 , 072004

NeurIPS Phys. Sci. proceeding

6 of 23

Surrogate Models

Neural surrogate models are used in many parts of simulation

To speed up simulation of particle interactions within a nucleus (i.e. many-body system)
Photon propagation detector physics processes
Detector response to high level input parameters (particle kinematics). Orders of magnitude speed-up allows to integrate the fast surrogate simulator into reconstruction.

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 6

Front. Big Data 5:868333

7 of 23

Simulation - ML for Physics Modeling

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 7

Traditional physics simulator requires manual process to optimize against data using separate softwares (i.e. calibration, reconstruction). ML-based approaches can bring automation to this process and/or flexibility to learn and represent missing physics models from real data

Diffusion models for generating images with qualities comparable to high fidelity simulator

Zeviel I. (Tufts) @ NPML 2024

8 of 23

Simulation - ML for Physics Modeling

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 8

Traditional physics simulator requires manual process to optimize against data using separate softwares (i.e. calibration, reconstruction). ML-based approaches can bring automation to this process and/or flexibility to learn and represent missing physics models from real data

Diffusion models for generating images with qualities comparable to high fidelity simulator
Differentiable physics simulator enables gradient-based optimization to solve inverse problems

Can simulate (forward) or calibrate/reconstruct (backward) with automated optimization

Initial Guess

Optimization

Final Prediction

True Track

Trajectory

Predicted Track

Trajectory

Predicted Detector �Hits

Photon Trajectories

Model

Evaluation

Parameter 𝜃

Input 𝑥

Output

Objectives

9 of 23

Domain Adaptation (Fighting Data Shift) - DAT

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 9

Simulation = largely accurate but not perfect. Optimizing a model using simulation, then applying for real data can result in data shift. As a consequence, the model may underperform on data.

Domain adversarial training: force the model to use only common features between two data/sim

A

B

Force the model to only learn common features across both domains

Examples on

MINERvA and

ICARUS

How?

Add a task to classify 2 domains, and maximize its error while minimizing the task (label) error. This pressures the model to learn only features common in both domains

10 of 23

Domain Adaptation (Fighting Data Shift) - Contrastive Learning

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 10

Simulation = largely accurate but not perfect. Optimizing a model using simulation, then applying for real data can result in data shift. As a consequence, the model may underperform on data.

Domain adversarial training: force the model to use only common features between two data/sim
Pre-training and fine-tuning: self-supervision on real data, fine-tune with small labeled samples

Augment data to make the model learn about common underlying (unchanged) features

Contrastive Learning

Image credit: Alexander W. (UCL), talk at NPML (2024)

11 of 23

Domain Adaptation (Fighting Data Shift)

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 11

Simulation = largely accurate but not perfect. Optimizing a model using simulation, then applying for real data can result in data shift. As a consequence, the model may underperform on data.

Domain adversarial training: force the model to use only common features between two data/sim
Pre-training and fine-tuning: self-supervision on real data, fine-tune with small labeled samples

Image credit: Masked Autoencoder paper

Reconstruction

Track v.s. Shower pixel-level separation

12 of 23

Summary

Lots of work in simulation, reconstruction, and domain adaptation

Deep learning models (mostly supervised), exploitation of symmetries using equivariant/invariant operations, neural surrogates, generative models, contrastive learning and mask-based self-supervision

Topics gaining more traction:

Unfolding in high-dimensionality using simulation-based inference
AI/ML for experiment design/operation optimization
AI/ML for human support: hazard detection, quality control, detector building process, communications, issue diagnosis, etc.

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 12

13 of 23

Appendix

13

14 of 23

Simulation - Differentiable Physics Modeling

White-box differentiable simulation enables interprable parameter optimization for calibration and reconstruction
Automatic differentiation provides exact gradients
Ideal for high-dimensional calibration problems
GPU-accelerated: O(ms) per event of 1M photons

Implementation Highlights:

JAX framework enables native automatic differentiation
Spatial grid filtering system reduces PMT checks from naive 10k to ~10 per photon
Modular architecture enables seamless parameter extension

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 14

Detector response showing accumulated charge in PMTs (1M photons)

Loss landscape when varying track position and opening angle White streamlines show computed gradient directions

O. Alterkait

15 of 23

Good practices

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 15

Typical “conceptual” issues in neutrinos:

Choosing the wrong method (not all problems need a neural network).
When working with neural networks, implementing a suboptimal architecture (e.g., limiting the net’s receptive field through the architecture).
Turning a regression problem into a classification one (i.e., discretising a continuous output).
Not leveraging the state-of-the-art methods (not always included in TMVA…).

arXiv:2108.02497

Preventing common mistakes.

16 of 23

ML model uncertainty

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 16

It is not always enough to make precise predictions.

In physics analysis, it’s typically necessary to quantify the uncertainty in the ML predictions.

Ensemble of methods might be sufficient.

Some approaches introduce probabilistic components into the models.

Example 1: MonteCarlo Dropout (arXiv:1506.02142).
Example 2: Bayesian Neural Networks (doi.org/10.1007/978-3-030-42553-1_3).

D. Koh et al. (2021)

Uncertainty Propagation & Estimation: Link to talk.

17 of 23

Problem

Neutrino generators (e.g., GENIE, NEUT) are great, but not perfect.

They rely on a variety of theoretical models and assumptions to simulate the complex interactions of neutrinos with matter (e.g. determining the final-state particles).
Other uncertainties can be fixed by tuning the simulation with calibration data.

Possible solution: train only on GEANT4 for controlled single-particle simulations.

PGUN or PBomb samples (must cover the unknown detector data distribution).
Provides precise control over initial conditions.
Facilitates systematic study of detector response.

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 17

Can we trust a ML model?

18 of 23

Domain adaptation

With the solution described in the previous slide, one needs to spend quite some effort in defining the PGUN/P-Bomb samples needed for training/testing.

A small shift in your distribution can make your ML model perform poorly.

Solution: domain adaptation models.

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 18

A

B

Force the model to only learn common features across both domains

Example on ICARUS (2022): doi/10.1103/PhysRevD.105.112009

Example on MINERvA (2018): doi.org/10.1088/1748-0221/13/11/P11020

B

A

Force the a domain shift

Through meta-learning, contrastive learning, differentiable simulations, etc

19 of 23

¿Black box?

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 19

�arXiv:2207.08815

�arXiv:2210.05189

20 of 23

Example: understanding a trained model (DUNE CVN)

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 20

Occlusion tests:

Hide parts of the images and check how the CVN reacts to the changes.

electron neutrino (𝜈_e)

original

occlusion map

muon neutrino (𝜈_μ)

original

occlusion map

Removing the start of the electron shower reduces the 𝜈e score, as expected

The CVN finds the vertex a bit ambiguous, but it is using the end point of the muon to gain a handle on the event type.

doi.org/10.1103/PhysRevD.102.092003

21 of 23

Topic 2:

Electron vs. multi-𝛾 event classification

21

Maksimovic et al., J. Cosmol. Astropart. Phys. 051 (2021)

Input PMT time and charge hitmap images

S. Fujita, S. Han, Y. Koshio

22 of 23

Topic 3:

Muon track reconstruction

22

DETR model (https://github.com/facebookresearch/detr)

Feature extractor

CNN (ResNet) backbone
Transformer encoder-decoder

Prediction head ← Modify this part for muon detection

https://arxiv.org/abs/2005.12872

S. Fujita, S. Han, Y. Koshio

23 of 23

Uses of machine learning in SK analyses

Recently, the number of analyses involving ML have been increasing

We need to develop large-scale machine learning infrastructure (GPU access, ML libraries, etc.) to support these efforts

Machine learning models are trained on MC samples

Validation using real data is critical, but this is often difficult to obtain
We need to utilize and/or newly develop appropriate calibration sources

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 23

S. Fujita, S. Han, Y. Koshio