A Fair Universe: Unbiased Data Benchmark Ecosystem for Physics;
And Experiences from ML Challenges
Wahid Bhimji
Data, AI and Analytics Services Group, NERSC
Paolo Calafiura, Steven Farrell, Aishik Ghosh, Isabelle Guyon, Shih-Chieh Hsu, Elham Khoda, Benjamin Nachman, Peter Nugent, Mathis Reymond, David Rousseau, Ihsan Ullah, Daniel Whiteson
and more …
ML Benchmarks/Challenges: already pushing the boundaries of science
2
LHC Olympics https://lhco2020.github.io/homepage/
An example: TrackML
https://www.kaggle.com/c/trackml-particle-identification
https://competitions.codalab.org/competitions/20112
3
Benchmarks can also help drive compute performance
MLPerf becoming an industry standard for ML performance
MLPerf Training v2.1 included nearly 200 results
4
MLPerf HPC
For Science/HPC Supercomputers
Push on HPC systems in important ways. Currently including:
5
Future considerations:
Background on FAIR Universe Project
Abstract:
“We [will] provide an open, large-compute-scale AI ecosystem for sharing datasets, training large models, fine-tuning those models, and hosting challenges and benchmarks. We will organize a challenge series, progressively rolling in tasks of increasing difficulty, based on novel datasets. Specifically, the tasks will focus on discovering and minimizing the effects of systematic uncertainties in HEP.”
7
Some background and definitions on tasks , benchmarks etc.
Task: Problem to be solved, calling for an Algorithm to solve; Using Dataset or data generated on-the-fly by a Simulator. Requires:
(1) splitting “input-data” made available to the Algorithm, and hidden “reference-data” (problem solution)
(2) “ingestion-program” and “scoring-program”
(3) Starting kit, including instructions; one sample Algorithm,(baseline).
Our platform will allow ingestion programs to interact directly with Simulators for data on demand
Challenge: Tasks with start and end dates, and prizes to winning entries. Challenges have usually 2 phases:
(1) Feedback phase: upload Algorithms, get performance feedback on validation Tasks on leaderboard.
(2) Final test phase: on final test Tasks (different from validation Tasks), and without any feedback.
Benchmark: Task benchmarks (TBK)s (most common): like challenges however,
(i) have no end date,
(ii) multiple entries per participant are allowed
Algorithm benchmarks (ABK)s - Tasks and Algorithms are inverted: participants submit Tasks.
Can identify algorithm weaknesses, in the spirit of “Data-centric” or “Adversarial” AI
8
Fair Universe - project objectives
9
10
Codabench and “Fair Universe” Platform
11
Based on https://www.codabench.org/
FAIR Universe Platform - Backed by NERSC !
12
#7, 93.8PF Peak
Perlmutter
1,536 GPU-accelerated nodes� 4 NVIDIA A100 GPUs + 1 AMD “Milan” CPU� 384 TB (CPU) + 240 TB (GPU) memory�3,072 CPU-only nodes� 2 AMD “Milan” CPUs� 1,536 TB CPU memory�HPE Slingshot 11 ethernet-compatible interconnect� 4 NICs/GPU node, 1 NIC/CPU node
Cori
9,600 Intel Xeon Phi “KNL” manycore nodes
2,000 Intel Xeon “Haswell” nodes
700,000 processor cores, 1.2 PB memory
Cray XC40 / Aries Dragonfly interconnect
30 PF Peak
28 PB
Scratch
700 �GB/s
2 PB
Burst Buffer
1.5 TB/s
120 PB
Common File System
275 TB
/home
100 GB/s
5 GB/s
DTNs, Spin, Gateways
2 x 100 Gb/s
SDN
50 GB/s
Ethernet & IB Fabric
Science Friendly Security
Production Monitoring
Power Efficiency
LAN
HPSS Tape� Archive
~200 PB
35 PB�All-Flash
Scratch
5 TB/s
Uncertainty-aware learning: Fundamental science in practice
Theory into Simulations
13
Ωc
σ8
θ12
mH …
Summary statistics:
Exp/Obs reconstruction
Uncertainty-aware learning
Theory into Simulations
14
Ωc
σ8
θ12
mH …
Exp/Obs reconstruction
Differences between simulation and data bias measurements
Baseline approach
15
Train a classifier on nominal data (Z=1) and just estimate uncertainties with alternate simulations. Full profile likelihood or shift Z and look at impact.
Simulation with Z = 1.0
Data
with Z = ?
Train
Apply
Simulation with Z = 1.0
Simulation
with Z = 0.95
Simulation
with Z = 1.05
Simplistic estimate of uncertainty
Full treatment —> Profile likelihood
Other ideas have focussed on decorrelation
16
16
Adversarial Training
Classifier output for various values of Z
Adversarial Training
Data Augmentation
Simulation with Z = 0.7
Simulation with Z = 0.9
+
+
Simulation with Z = 1.1
Simulation with Z = 1.2
+
Data
with Z = ?
Similar ideas: 1905.10384, 1305.7248, 1907.11674,
Nice comparison by Estrade et al.
Aim is to reduce final uncertainty
Instead of decorrelation can fully parameterise the classifier on Z in a “uncertainty aware” way
17
Data
with Z = ?
Ghosh, Nachman, Whiteson
Dataset from HiggsML Challenge modified to include systematic uncertainty on “tau-energy scale”
Uncertainty-Aware performs as well as classifier trained on true Z
(single systematic, limited data)
Narrower is better
Up is better
μ = 1, Z= 0.8
(Signal Strength)
Push for novel uncertainty-aware approaches
E.g. Inferno: arxiv:1806.04743
NEOS: arxiv:2203.05570
Methods for direct profile-likelihood: e.g. arxiv:2203.13079
Need for new datasets, benchmark and platform to enable progress:
18
Questions?
Collaboration?
Conclusions
Wahid Bhimji
wbhimji@lbl.gov
19