1 of 23

AI/ML Techniques Overview in Neutrino Physics

Patrick de Perio (Kavli IPMU)

Kazu Terao (SLAC)

FAIRS-Japan @ KMI

2 of 23

AI/ML Applications in Neutrino Physics

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 2

  • Reconstruction
  • Surrogate Models
  • Simulation
  • Domain Adaptation

Past workshops on Neutrino Physics Machine Learning (NPML):� 2020 (Remote), 2023 (Tufts), 2024 (Zurich)

3 of 23

Data Reconstruction and Analysis

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 3

image height

image width

image depth

features

repeat

...

P(μ±)

P(e±)

P(π0)

P(γ)

2. Convolutions & down-samples

3. Fully connected neural network

“Softmax discriminators”

Many applications of Convolutional/Graph Neural Networks

  • Supervised learning using truth information from high fidelity simulation
  • Input data pre-processed into an image or a graph format
    • Can come with bias/loss of information, or significant computation

Softmax P(γ)

νe CC0π

NC γ

NC π0

Distance to detector wall (cm)

e/μ identification

1. Pre-processing

4 of 23

Data Reconstruction and Analysis

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 4

Many applications of Convolutional/Graph Neural Networks

  • Supervised learning using truth information from high fidelity simulation
  • Input data pre-processed into an image or a graph format
    • Can come with bias/loss of information, or significant computation
  • Models that exploit geometrical symmetries (invariance/equivariance)

Spherical CNN and KamNet in KamLAND

E3NN: Euclidean Neural Nets and on-going application to LArTPC

Euclidean (3) equivariant neural network model for translation, rotation, and mirroring

5 of 23

Data Reconstruction and Analysis

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 5

Many applications of Convolutional/Graph Neural Networks

  • Supervised learning using truth information from high fidelity simulation
  • Input data pre-processed into an image or a graph format
    • Can come with bias/loss of information, or significant computation
  • Models that exploit geometrical symmetries (invariance/equivariance)
  • Composite models for end-to-end object reconstruction

6 of 23

Surrogate Models

Neural surrogate models are used in many parts of simulation

  • To speed up simulation of particle interactions within a nucleus (i.e. many-body system)
  • Photon propagation detector physics processes
  • Detector response to high level input parameters (particle kinematics). Orders of magnitude speed-up allows to integrate the fast surrogate simulator into reconstruction.

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 6

7 of 23

Simulation - ML for Physics Modeling

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 7

Traditional physics simulator requires manual process to optimize against data using separate softwares (i.e. calibration, reconstruction). ML-based approaches can bring automation to this process and/or flexibility to learn and represent missing physics models from real data

  • Diffusion models for generating images with qualities comparable to high fidelity simulator

8 of 23

Simulation - ML for Physics Modeling

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 8

Traditional physics simulator requires manual process to optimize against data using separate softwares (i.e. calibration, reconstruction). ML-based approaches can bring automation to this process and/or flexibility to learn and represent missing physics models from real data

  • Diffusion models for generating images with qualities comparable to high fidelity simulator
  • Differentiable physics simulator enables gradient-based optimization to solve inverse problems
    • Can simulate (forward) or calibrate/reconstruct (backward) with automated optimization

Initial Guess

Optimization

Final Prediction

True Track

Trajectory

Predicted Track

Trajectory

Predicted Detector �Hits

Photon Trajectories

Model

Evaluation

Parameter 𝜃

Input 𝑥

Output

Objectives

9 of 23

Domain Adaptation (Fighting Data Shift) - DAT

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 9

Simulation = largely accurate but not perfect. Optimizing a model using simulation, then applying for real data can result in data shift. As a consequence, the model may underperform on data.

  • Domain adversarial training: force the model to use only common features between two data/sim

A

B

Force the model to only learn common features across both domains

Examples on

MINERvA and

ICARUS

How?

Add a task to classify 2 domains, and maximize its error while minimizing the task (label) error. This pressures the model to learn only features common in both domains

10 of 23

Domain Adaptation (Fighting Data Shift) - Contrastive Learning

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 10

Simulation = largely accurate but not perfect. Optimizing a model using simulation, then applying for real data can result in data shift. As a consequence, the model may underperform on data.

  • Domain adversarial training: force the model to use only common features between two data/sim
  • Pre-training and fine-tuning: self-supervision on real data, fine-tune with small labeled samples

Augment data to make the model learn about common underlying (unchanged) features

Contrastive Learning

Image credit: Alexander W. (UCL), talk at NPML (2024)

11 of 23

Domain Adaptation (Fighting Data Shift)

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 11

Simulation = largely accurate but not perfect. Optimizing a model using simulation, then applying for real data can result in data shift. As a consequence, the model may underperform on data.

  • Domain adversarial training: force the model to use only common features between two data/sim
  • Pre-training and fine-tuning: self-supervision on real data, fine-tune with small labeled samples

Reconstruction

Track v.s. Shower pixel-level separation

12 of 23

Summary

  • Lots of work in simulation, reconstruction, and domain adaptation
    • Deep learning models (mostly supervised), exploitation of symmetries using equivariant/invariant operations, neural surrogates, generative models, contrastive learning and mask-based self-supervision
  • Topics gaining more traction:
    • Unfolding in high-dimensionality using simulation-based inference
    • AI/ML for experiment design/operation optimization
    • AI/ML for human support: hazard detection, quality control, detector building process, communications, issue diagnosis, etc.

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 12

13 of 23

Appendix

13

14 of 23

Simulation - Differentiable Physics Modeling

  • White-box differentiable simulation enables interprable parameter optimization for calibration and reconstruction
  • Automatic differentiation provides exact gradients
  • Ideal for high-dimensional calibration problems
  • GPU-accelerated: O(ms) per event of 1M photons

Implementation Highlights:

  • JAX framework enables native automatic differentiation
  • Spatial grid filtering system reduces PMT checks from naive 10k to ~10 per photon
  • Modular architecture enables seamless parameter extension

FAIRS-Japan, Dec. 3-5, 2024 AI/ML in Neutrino 14

Detector response showing accumulated charge in PMTs (1M photons)

Loss landscape when varying track position and opening angle White streamlines show computed gradient directions

O. Alterkait

15 of 23

Good practices

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 15

  • Typical “conceptual” issues in neutrinos:
    • Choosing the wrong method (not all problems need a neural network).
    • When working with neural networks, implementing a suboptimal architecture (e.g., limiting the net’s receptive field through the architecture).
    • Turning a regression problem into a classification one (i.e., discretising a continuous output).
    • Not leveraging the state-of-the-art methods (not always included in TMVA…).

Preventing common mistakes.

16 of 23

ML model uncertainty

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 16

  • It is not always enough to make precise predictions.
    • In physics analysis, it’s typically necessary to quantify the uncertainty in the ML predictions.

  • Ensemble of methods might be sufficient.

  • Some approaches introduce probabilistic components into the models.
  • Uncertainty Propagation & Estimation: Link to talk.

17 of 23

Problem

  • Neutrino generators (e.g., GENIE, NEUT) are great, but not perfect.
    • They rely on a variety of theoretical models and assumptions to simulate the complex interactions of neutrinos with matter (e.g. determining the final-state particles).
    • Other uncertainties can be fixed by tuning the simulation with calibration data.

  • Possible solution: train only on GEANT4 for controlled single-particle simulations.
    • PGUN or PBomb samples (must cover the unknown detector data distribution).
    • Provides precise control over initial conditions.
    • Facilitates systematic study of detector response.

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 17

Can we trust a ML model?

18 of 23

Domain adaptation

  • With the solution described in the previous slide, one needs to spend quite some effort in defining the PGUN/P-Bomb samples needed for training/testing.

  • A small shift in your distribution can make your ML model perform poorly.

  • Solution: domain adaptation models.

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 18

A

B

Force the model to only learn common features across both domains

B

A

Force the a domain shift

Through meta-learning, contrastive learning, differentiable simulations, etc

19 of 23

¿Black box?

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 19

20 of 23

Example: understanding a trained model (DUNE CVN)

Dr. Saúl Alonso-Monsalve – ETH Zurich

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 20

  • Occlusion tests:
    • Hide parts of the images and check how the CVN reacts to the changes.

electron neutrino (𝜈e)

original

occlusion map

muon neutrino (𝜈μ)

original

occlusion map

Removing the start of the electron shower reduces the 𝜈e score, as expected

The CVN finds the vertex a bit ambiguous, but it is using the end point of the muon to gain a handle on the event type.

21 of 23

Topic 2:

Electron vs. multi-𝛾 event classification

21

Maksimovic et al., J. Cosmol. Astropart. Phys. 051 (2021)

  • Input PMT time and charge hitmap images

S. Fujita, S. Han, Y. Koshio

22 of 23

Topic 3:

Muon track reconstruction

22

  • DETR model (https://github.com/facebookresearch/detr)
    • Feature extractor
      • CNN (ResNet) backbone
      • Transformer encoder-decoder
    • Prediction head ← Modify this part for muon detection

https://arxiv.org/abs/2005.12872

S. Fujita, S. Han, Y. Koshio

23 of 23

Uses of machine learning in SK analyses

  • Recently, the number of analyses involving ML have been increasing
    • We need to develop large-scale machine learning infrastructure (GPU access, ML libraries, etc.) to support these efforts
  • Machine learning models are trained on MC samples
    • Validation using real data is critical, but this is often difficult to obtain
    • We need to utilize and/or newly develop appropriate calibration sources

FAIRS-Japan, Dec. 3-5, 2024 Neutrino 23

S. Fujita, S. Han, Y. Koshio