1 of 19

A Fair Universe: Unbiased Data Benchmark Ecosystem for Physics;

And Experiences from ML Challenges

Wahid Bhimji

Data, AI and Analytics Services Group, NERSC

Paolo Calafiura, Steven Farrell, Aishik Ghosh, Isabelle Guyon, Shih-Chieh Hsu, Elham Khoda, Benjamin Nachman, Peter Nugent, Mathis Reymond, David Rousseau, Ihsan Ullah, Daniel Whiteson

and more …

2 of 19

ML Benchmarks/Challenges: already pushing the boundaries of science

LHC Olympics https://lhco2020.github.io/homepage/

Open Catalyst Challenge

https://opencatalystproject.org/challenge.html

Higgs Kaggle Challenge

https://www.kaggle.com/c/higgs-boson

3 of 19

An example: TrackML

https://www.kaggle.com/c/trackml-particle-identification

https://competitions.codalab.org/competitions/20112

4 of 19

Benchmarks can also help drive compute performance

MLPerf becoming an industry standard for ML performance

MLPerf Training v2.1 included nearly 200 results

https://mlcommons.org/

5 of 19

MLPerf HPC

For Science/HPC Supercomputers

Push on HPC systems in important ways. Currently including:

CosmoFlow - 3D CNN predicting cosmological parameters
DeepCAM - segmentation of phenomena in climate sims
OpenCatalyst - GNN modeling atomic catalyst systems

MLPerf HPC v1.0 release at SC21 conference:

Time-to-train and “Weak-scaling” (models/min) metrics
Strong-scaling submission scale up to 2,048 GPUs
“Weak-scaling” to 5,120 GPUs (Perlmutter) and 82,944 CPUs (Fugaku)

Deeper analysis paper at the SC21 MLHPC workshop
MLPerf HPC v2.0 round unveiled at SC22

Future considerations:

Possible Drug Design and Protein folding (AlphaFold/OpenFold) benchmarks
Power measurements; A long-lived “leaderboard”
Bootcamps / hackathons to help the community learn

7 of 19

Background on FAIR Universe Project

3 year project funded by AI for HEP call (DE-FOA-0002705)

Abstract:

“We [will] provide an open, large-compute-scale AI ecosystem for sharing datasets, training large models, fine-tuning those models, and hosting challenges and benchmarks. We will organize a challenge series, progressively rolling in tasks of increasing difficulty, based on novel datasets. Specifically, the tasks will focus on discovering and minimizing the effects of systematic uncertainties in HEP.”

Broad team in HEP, ML and computing involved in several previous challenges and benchmarks for HEP (e.g. HiggsML and TrackML) and wider (e.g NeurIPS competition track, MLPerf HPC); as well as Uncertainty aware learning in HEP

8 of 19

Some background and definitions on tasks , benchmarks etc.

Task: Problem to be solved, calling for an Algorithm to solve; Using Dataset or data generated on-the-fly by a Simulator. Requires:

(1) splitting “input-data” made available to the Algorithm, and hidden “reference-data” (problem solution)

(2) “ingestion-program” and “scoring-program”

(3) Starting kit, including instructions; one sample Algorithm,(baseline).

Our platform will allow ingestion programs to interact directly with Simulators for data on demand

Challenge: Tasks with start and end dates, and prizes to winning entries. Challenges have usually 2 phases:

(1) Feedback phase: upload Algorithms, get performance feedback on validation Tasks on leaderboard.

(2) Final test phase: on final test Tasks (different from validation Tasks), and without any feedback.

Benchmark: Task benchmarks (TBK)s (most common): like challenges however,

(i) have no end date,

(ii) multiple entries per participant are allowed

Algorithm benchmarks (ABK)s - Tasks and Algorithms are inverted: participants submit Tasks.

Can identify algorithm weaknesses, in the spirit of “Data-centric” or “Adversarial” AI

9 of 19

Fair Universe - project objectives

11 of 19

Codabench and “Fair Universe” Platform

Based on https://www.codabench.org/

12 of 19

FAIR Universe Platform - Backed by NERSC !

#7, 93.8PF Peak

Perlmutter

1,536 GPU-accelerated nodes� 4 NVIDIA A100 GPUs + 1 AMD “Milan” CPU� 384 TB (CPU) + 240 TB (GPU) memory�3,072 CPU-only nodes� 2 AMD “Milan” CPUs� 1,536 TB CPU memory�HPE Slingshot 11 ethernet-compatible interconnect� 4 NICs/GPU node, 1 NIC/CPU node

Cori

9,600 Intel Xeon Phi “KNL” manycore nodes

2,000 Intel Xeon “Haswell” nodes

700,000 processor cores, 1.2 PB memory

Cray XC40 / Aries Dragonfly interconnect

30 PF Peak

28 PB

Scratch

700 �GB/s

2 PB

Burst Buffer

1.5 TB/s

120 PB

Common File System

275 TB

/home

100 GB/s

5 GB/s

DTNs, Spin, Gateways

2 x 100 Gb/s

SDN

50 GB/s

Ethernet & IB Fabric

Science Friendly Security

Production Monitoring

Power Efficiency

LAN

HPSS Tape� Archive

~200 PB

35 PB�All-Flash

Scratch

5 TB/s

13 of 19

Uncertainty-aware learning: Fundamental science in practice

Theory into Simulations

High-resolution with detailed physics and instrument/ detector simulation

Ωc

σ8

θ₁₂

m_H…

Summary statistics:

E.g. 2pt /3pt correlation: spatial distribution
E.g. Masses of reconstructed particles

Exp/Obs reconstruction

Derive position of galaxies/stars and properties for catalogs
Reconstruct particle properties

14 of 19

Uncertainty-aware learning

Theory into Simulations

Estimate Systematic Uncertainties (Z)

Ωc

σ8

θ₁₂

m_H…

Exp/Obs reconstruction

Detector state Z=?

Differences between simulation and data bias measurements

15 of 19

Baseline approach

Train a classifier on nominal data (Z=1) and just estimate uncertainties with alternate simulations. Full profile likelihood or shift Z and look at impact.

Simulation with Z = 1.0

Data

with Z = ?

Train

Apply

Simulation with Z = 1.0

Simulation

with Z = 0.95

Simulation

with Z = 1.05

Simplistic estimate of uncertainty

Full treatment —> Profile likelihood

16 of 19

Other ideas have focussed on decorrelation

Adversarial Training

Classifier output for various values of Z

Adversarial Training

Data Augmentation

Simulation with Z = 0.7

Simulation with Z = 0.9

Simulation with Z = 1.1

Simulation with Z = 1.2

Data

with Z = ?

Learning to Pivot [1611.01046]

Similar ideas: 1905.10384, 1305.7248, 1907.11674,

epjconf_chep2018_06024

Nice comparison by Estrade et al.

17 of 19

Aim is to reduce final uncertainty

Instead of decorrelation can fully parameterise the classifier on Z in a “uncertainty aware” way

Data

with Z = ?

Ghosh, Nachman, Whiteson

PhysRevD.104.056026

Dataset from HiggsML Challenge modified to include systematic uncertainty on “tau-energy scale”

Uncertainty-Aware performs as well as classifier trained on true Z

(single systematic, limited data)

Narrower is better

Up is better

μ = 1, Z= 0.8

(Signal Strength)

18 of 19

Push for novel uncertainty-aware approaches

E.g. Inferno: arxiv:1806.04743

NEOS: arxiv:2203.05570

Methods for direct profile-likelihood: e.g. arxiv:2203.13079

Need for new datasets, benchmark and platform to enable progress:

HiggsML dataset too small for anything ambitious (systematic uncertainty small compared to statistical uncertainty)
How to scale to many NPs ? (Training hard, profiling expensive)
Can faster methods allow for directly evaluating profile likelihood ?

19 of 19

Questions?

Collaboration?

Conclusions

Wahid Bhimji

wbhimji@lbl.gov

Machine learning benchmarks and challenges drive progress in the sciences as well as computational performance
Current tasks and ML models require flexible platforms and substantial compute resources
Uncertainty-aware learning is a challenging problem that can fully exploit this ecosystem

1 of 19

2 of 19

3 of 19

4 of 19

5 of 19

6 of 19

7 of 19

8 of 19

9 of 19

10 of 19

11 of 19

12 of 19

13 of 19

14 of 19

15 of 19

16 of 19

17 of 19

18 of 19

19 of 19