1 of 19

A Fair Universe: Unbiased Data Benchmark Ecosystem for Physics;

And Experiences from ML Challenges

Wahid Bhimji

Data, AI and Analytics Services Group, NERSC

Paolo Calafiura, Steven Farrell, Aishik Ghosh, Isabelle Guyon, Shih-Chieh Hsu, Elham Khoda, Benjamin Nachman, Peter Nugent, Mathis Reymond, David Rousseau, Ihsan Ullah, Daniel Whiteson

and more …

2 of 19

ML Benchmarks/Challenges: already pushing the boundaries of science

2

Higgs Kaggle Challenge

https://www.kaggle.com/c/higgs-boson

3 of 19

An example: TrackML

3

4 of 19

Benchmarks can also help drive compute performance

MLPerf becoming an industry standard for ML performance

MLPerf Training v2.1 included nearly 200 results

4

5 of 19

MLPerf HPC

For Science/HPC Supercomputers

Push on HPC systems in important ways. Currently including:

    • CosmoFlow - 3D CNN predicting cosmological parameters
    • DeepCAM - segmentation of phenomena in climate sims
    • OpenCatalyst - GNN modeling atomic catalyst systems

5

  • MLPerf HPC v1.0 release at SC21 conference:
    • Time-to-train and “Weak-scaling” (models/min) metrics
    • Strong-scaling submission scale up to 2,048 GPUs
    • “Weak-scaling” to 5,120 GPUs (Perlmutter) and 82,944 CPUs (Fugaku)
  • Deeper analysis paper at the SC21 MLHPC workshop
  • MLPerf HPC v2.0 round unveiled at SC22

Future considerations:

  • Possible Drug Design and Protein folding (AlphaFold/OpenFold) benchmarks
  • Power measurements; A long-lived “leaderboard”
  • Bootcamps / hackathons to help the community learn

6 of 19

7 of 19

Background on FAIR Universe Project

  • 3 year project funded by AI for HEP call (DE-FOA-0002705)

Abstract:

“We [will] provide an open, large-compute-scale AI ecosystem for sharing datasets, training large models, fine-tuning those models, and hosting challenges and benchmarks. We will organize a challenge series, progressively rolling in tasks of increasing difficulty, based on novel datasets. Specifically, the tasks will focus on discovering and minimizing the effects of systematic uncertainties in HEP.”

7

8 of 19

Some background and definitions on tasks , benchmarks etc.

Task: Problem to be solved, calling for an Algorithm to solve; Using Dataset or data generated on-the-fly by a Simulator. Requires:

(1) splitting “input-data” made available to the Algorithm, and hidden “reference-data” (problem solution)

(2) “ingestion-program” and “scoring-program”

(3) Starting kit, including instructions; one sample Algorithm,(baseline).

Our platform will allow ingestion programs to interact directly with Simulators for data on demand

Challenge: Tasks with start and end dates, and prizes to winning entries. Challenges have usually 2 phases:

(1) Feedback phase: upload Algorithms, get performance feedback on validation Tasks on leaderboard.

(2) Final test phase: on final test Tasks (different from validation Tasks), and without any feedback.

Benchmark: Task benchmarks (TBK)s (most common): like challenges however,

(i) have no end date,

(ii) multiple entries per participant are allowed

Algorithm benchmarks (ABK)s - Tasks and Algorithms are inverted: participants submit Tasks.

Can identify algorithm weaknesses, in the spirit of “Data-centric” or “Adversarial” AI

8

9 of 19

Fair Universe - project objectives

9

10 of 19

10

11 of 19

Codabench and “Fair Universe” Platform

11

12 of 19

FAIR Universe Platform - Backed by NERSC !

12

#7, 93.8PF Peak

Perlmutter

1,536 GPU-accelerated nodes� 4 NVIDIA A100 GPUs + 1 AMD “Milan” CPU� 384 TB (CPU) + 240 TB (GPU) memory�3,072 CPU-only nodes� 2 AMD “Milan” CPUs� 1,536 TB CPU memory�HPE Slingshot 11 ethernet-compatible interconnect� 4 NICs/GPU node, 1 NIC/CPU node

Cori

9,600 Intel Xeon Phi “KNL” manycore nodes

2,000 Intel Xeon “Haswell” nodes

700,000 processor cores, 1.2 PB memory

Cray XC40 / Aries Dragonfly interconnect

30 PF Peak

28 PB

Scratch

700 �GB/s

2 PB

Burst Buffer

1.5 TB/s

120 PB

Common File System

275 TB

/home

100 GB/s

5 GB/s

DTNs, Spin, Gateways

2 x 100 Gb/s

SDN

50 GB/s

Ethernet & IB Fabric

Science Friendly Security

Production Monitoring

Power Efficiency

LAN

HPSS Tape� Archive

~200 PB

35 PB�All-Flash

Scratch

5 TB/s

13 of 19

Uncertainty-aware learning: Fundamental science in practice

Theory into Simulations

  • High-resolution with detailed physics and instrument/ detector simulation

13

Ωc

σ8

θ12

mH

Summary statistics:

  • E.g. 2pt /3pt correlation: spatial distribution
  • E.g. Masses of reconstructed particles

Exp/Obs reconstruction

  • Derive position of galaxies/stars and properties for catalogs
  • Reconstruct particle properties

14 of 19

Uncertainty-aware learning

Theory into Simulations

  • Estimate Systematic Uncertainties (Z)

14

Ωc

σ8

θ12

mH

Exp/Obs reconstruction

  • Detector state Z=?

Differences between simulation and data bias measurements

15 of 19

Baseline approach

15

Train a classifier on nominal data (Z=1) and just estimate uncertainties with alternate simulations. Full profile likelihood or shift Z and look at impact.

Simulation with Z = 1.0

Data

with Z = ?

Train

Apply

Simulation with Z = 1.0

Simulation

with Z = 0.95

Simulation

with Z = 1.05

Simplistic estimate of uncertainty

Full treatment —> Profile likelihood

16 of 19

Other ideas have focussed on decorrelation

16

16

Adversarial Training

Classifier output for various values of Z

Adversarial Training

Data Augmentation

Simulation with Z = 0.7

Simulation with Z = 0.9

+

+

Simulation with Z = 1.1

Simulation with Z = 1.2

+

Data

with Z = ?

Nice comparison by Estrade et al.

17 of 19

Aim is to reduce final uncertainty

Instead of decorrelation can fully parameterise the classifier on Z in a “uncertainty aware” way

17

Data

with Z = ?

Ghosh, Nachman, Whiteson

PhysRevD.104.056026

Dataset from HiggsML Challenge modified to include systematic uncertainty on “tau-energy scale”

Uncertainty-Aware performs as well as classifier trained on true Z

(single systematic, limited data)

Narrower is better

Up is better

μ = 1, Z= 0.8

(Signal Strength)

18 of 19

Push for novel uncertainty-aware approaches

E.g. Inferno: arxiv:1806.04743

NEOS: arxiv:2203.05570

Methods for direct profile-likelihood: e.g. arxiv:2203.13079

Need for new datasets, benchmark and platform to enable progress:

  • HiggsML dataset too small for anything ambitious (systematic uncertainty small compared to statistical uncertainty)
  • How to scale to many NPs ? (Training hard, profiling expensive)
  • Can faster methods allow for directly evaluating profile likelihood ?

18

19 of 19

Questions?

Collaboration?

Conclusions

Wahid Bhimji

wbhimji@lbl.gov

  • Machine learning benchmarks and challenges drive progress in the sciences as well as computational performance
  • Current tasks and ML models require flexible platforms and substantial compute resources
  • Uncertainty-aware learning is a challenging problem that can fully exploit this ecosystem

19