1 of 32

6th European Advanced Accelerator Concepts workshop (EAAC'23)

WG3: Theory and simulations

Elba, Italy�September 20th, 2023

Exascale and�ML Models�for Accelerator Simulations

Axel Huebl

Lawrence Berkeley National Laboratory

On behalf of the WarpX, ImpactX & pyAMReX teams�LBNL, LLNL, SLAC, CEA, DESY, TAE, CERN

AM

LDRD

1

2 of 32

Funding Support

WarpX: longitudinal electric field in a laser-plasma accelerator

rendered with Ascent & VTK-m

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation’s exascale computing imperative. This work was also performed in part by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory under U.S. Department of Energy Contract No. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under Contract No. DE-AC52–07NA27344 and SLAC National Accelerator Laboratory under Contract No. AC02–76SF00515. Supported by the CAMPA collaboration, a project of the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of High Energy Physics, Scientific Discovery through Advanced Computing (SciDAC) program.�This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725, the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231, and the supercomputer Fugaku provided by RIKEN.

The EAAC23 Workshop was supported by the EU I.FAST project. This project has received funding from�the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No 101004730.

github.com/ECP-WarpX

github.com/openPMD

github.com/AMReX-Codes

github.com/picmi-standard

2

3 of 32

Outline

  • Advanced Accelerator Modeling at Exascale
    • WarpX and ImpactX in the Beam, Plasma and Accelerator Simulation Toolkit
    • Addressing a Cambrian Explosion of Compute Architectures
    • Plasma Mirror Simulations on the Full Size of the First Exascale Supercomputer

  • Across Scales: Advanced and Conventional Accelerators
    • Connecting Exascale and ML
    • ML-Enabled, Hybrid Beamlines

LDRD

3

4 of 32

Advanced Accelerator Modeling�at Exascale

4

5 of 32

Ultimate goal: virtual accelerator with on-the-fly tunability of physics & numerics complexity to users

Incomplete physics

Full physics

1D-1V

3D-3V

Low resolution

High resolution

Reduced models

First principles

Fast

Great for ensemble runs for design studies

Accurate

Great for detailed runs for physics studies

Goal

Start-to-end model-�ing in an open software ecosystem.

Start-to-End Modeling R&D

  • advanced models: numerics, AI/ML surrogates
  • speed & scalability: team science with computer sci.
  • flexibility & reliability: modern software ecosystem

5

6 of 32

WarpX is a GPU-Accelerated PIC Code for Exascale

Available Particle-in-Cell Loops

  • electrostatic & electromagnetic (fully kinetic)

Push particles

Deposit currents

Solve fields

Gather fields

 

 

 

 

Geometries

  • 1D3V, 2D3V,�3D3V and�RZ (quasi-�cylindrical)

Advanced algorithms

boosted frame, spectral solvers, Galilean frame, embedded boundaries + CAD, MR, ...

Multi-Physics Modules

field ionization of atomic levels, Coulomb�collisions, QED processes (e.g. pair creation), macroscopic materials

Multi-Node parallelization

  • MPI: 3D domain decomposition
  • dynamic load balancing�

On-Node Parallelization

  • GPU: CUDA, HIP and SYCL
  • CPU: OpenMP�

Scalable, Standardized I/O

  • PICMI Python interface
  • openPMD (HDF5 or ADIOS)
  • in situ diagnostics

6

7 of 32

WarpX: conceived & developed by a multidisciplinary, multi-institution team

7

Ryan

Sandberg

Andrew

Myers

Weiqun

Zhang

John

Bell

Jean-Luc Vay (ECP PI)

Rémi

Lehe

Olga Shapoval

Ann Almgren�(ECP coPI)

Marc Hogan�(ECP coPI)

Lixin�Ge

Cho

Ng

David Grote�(ECP coPI)

Revathi

Jambunathan

Axel�Huebl

Yinjian

Zhao

Kevin

Gott

(NESAP)

Edoardo

Zoni

Hannah Klion

Prabhat Kumar

Junmin

Gu

Marco Garten

AM

Arianna Formenti

(France)

  • a growing list of contributors from labs, universities…

Lorenzo

Giacomel

(Switzerland)

…& private sector

Henri

Vincenti

Luca

Fedeli

Thomas

Clark

Neïl

Zaim

Pierre

Bartoli

(Germany)

Maxence Thévenet

Alexander

Sinn

7

8 of 32

ImpactX: GPU-, AMR- & AI/ML-Accelerated Beam Dynamics�

Particle-in-Cell Loop

  • electrostatic
    • with space-charge effects
  • s-based
    • relative to a reference particle
    • elements: symplectic maps

Fireproof Numerics

based on IMPACT suite of codes, esp. IMPACT-Z and MaryLie

Triple Acceleration Approach

  • GPU support
  • Adaptive Mesh Refinement
  • AI/ML & Data Driven Models

github.com/ECP-WarpX/impactx

LDRD

User-Friendly

  • single-source C++, full Python control
  • fully tested
  • fully documented

Multi-Node parallelization

  • MPI: domain decomposition
  • dynamic load balancing (in dev.)�

On-Node Parallelization

  • GPU: CUDA, HIP and SYCL
  • CPU: OpenMP�

Scalable, Parallel I/O

  • openPMD
  • in situ analysis

💡 Same Script

CPU/GPU & MPI

8

9 of 32

ImpactX: Easy to Use, Extent, Tested and Documented�

LDRD

github.com/ECP-WarpX/impactx

Example: ImpactX FODO Cell Lattice

9

10 of 32

We Develop Openly with the Community

python3 -m pip install .

brew tap ecp-warpx/warpx

brew install warpx

spack install warpx

spack install py-warpx

conda install

-c conda-forge warpx

module load warpx

module load py-warpx

cmake -S . -B build

cmake --build build --target install

Open-Source Development & Benchmarks:�github.com/ECP-WarpX

Online Documentation:�warpx|hipace|impactx.readthedocs.io

Rapid and easy installation on any platform:

230 physics benchmarks run on every code change of WarpX

19 physics benchmarks + 106 tests for ImpactX

10

11 of 32

Power-Limits Seed a Cambrian Explosion of Compute Architectures

without tiling

with tiling

Field-Programmable Gate Array (FPGA)

Application-Specific Integrated Circuit (ASIC)

Quantum- Circuit

potential future

distribute one simulation

millions of cores

over

10,000s of computers

for

J-L Vay, A Huebl et al., PoP 28.2, 023105 (2021); A Myers et al, JParCo 108.102833, (2021); L Fedeli, A Huebl et al., SC22 (2022)

Ranks and Details the 500 most powerful non-distributed Computer Systems, TOP500.org (June 2023)

now

First-of-their-kind platforms: NERSC (Intel, then Nvidia)→Exascale: OLCF (AMD), ALCF (Intel)

11

12 of 32

Community Approaches to Exascale Programming

Applications

Libraries

PIC Algorithms

Communication

Performance Portability

Programming Models

/ HIP

ARM

Hardware

AMD

B Worpitz, MA (2015); E Zenker, A Huebl et al., IPDPSW (2016); E Zenker, A Huebl et al., IWOPH (2017); A Matthes, A Huebl et al., P3MA (2017); A Myers et al., JPARCO (2021); HC Edwards et al., SciProg (2012); RD Hornung et al., OSTI TR (2014)

Warp

Vendor

Scripts

WarpX

Vendor

Scripts

HiPACE++

domain science libs

AMReX

Math

IO

Impact�X

Artemis

Performance Portability Layer

Then

Now

ABLASTR

PICSAR-�QED

pyAMReX

12

13 of 32

WarpX is now 500x More Performant than its Baseline

April-July 2022: WarpX on world’s largest HPCsL. Fedeli, A. Huebl et al., Gordon Bell Prize Winner at SC’22, 2022

from a full stage simulation

Figure-of-Merit: weighted updates / sec

110x

500x

Note: Perlmutter & Frontier were pre-acceptance measurements!

68,608 GPUs of First Exascale�Machine

7,299,072�CPU Cores

13

14 of 32

2022 ACM Gordon Bell Prize: using the First Exascale Supercomputer

April-July 2022: WarpX on world’s largest HPCsL. Fedeli, A. Huebl et al., Gordon Bell Prize Winner at SC’22, 2022

A success story of a multidisciplinary, multi-institutional team!

L. Fedeli, A. Huebl et al., IEEE, SC22 (2022)

M. Thévenet et al., Nat. Phys 12 (2016)

14

15 of 32

2022 ACM Gordon Bell Prize: using the First Exascale Supercomputer

April-July 2022: WarpX on world’s largest HPCsL. Fedeli, A. Huebl et al., Gordon Bell Prize Winner at SC’22, 2022

A success story of a multidisciplinary, multi-institutional team!

≈ nC

L. Fedeli, A. Huebl et al., IEEE, SC22 (2022)

M. Thévenet et al., Nat. Phys 12 (2016)

15

16 of 32

2022 ACM Gordon Bell Prize: using the First Exascale Supercomputer

April-July 2022: WarpX on world’s largest HPCsL. Fedeli, A. Huebl et al., Gordon Bell Prize Winner at SC’22, 2022

A success story of a multidisciplinary, multi-institutional team!

L. Fedeli, A. Huebl et al., IEEE, SC22 (2022)

M. Thévenet et al., Nat. Phys 12 (2016)

16

17 of 32

If You Want to Go Far, Go Together

-5

Code A

Code B

...

Particle-In-Cell

Modeling Interface

open Particle Mesh�Data standard

Standardization…

  • Inputs
  • Data
  • Reference�Implementations

strong int. partnerships

A Huebl et al., DOI:10.5281/zenodo.591699 (2015)�DP Grote et al., Particle-In-Cell Modeling Interface (PICMI) (2021)�LD Amorim et al., GPos (2021); M Thévenet et al., DOI:10.5281/zenodo.8277220 (2023)�A Ferran Pousa et al., DOI:10.5281/zenodo.7989119 (2023)�RT Sandberg et al., IPAC23, DOI:10.18429/JACoW-IPAC-23-WEPA101 (2023)

LDRD

… Accelerates Innovation

17

18 of 32

Across Scales: Advanced and Conventional Accelerators

18

19 of 32

BLAST is Now An Accelerated, Machine-Learning Boosted Ecosystem

fields & particles

tensors arrays

LDRD

A Huebl (PI), R Sandberg,�R Lehe, CE Mitchell et al.

A Huebl et al., NAPAC22, DOI:10.18429/JACoW-NAPAC2022-TUYE2 (2022)

RT Sandberg et al and A Huebl, IPAC23, DOI:10.18429/JACoW-IPAC-23-WEPA101 (2023)

A Huebl et al., AAC22, arXiv:2303.12873 (2023); RT Sandberg et al. and A Huebl, in preparation (2023)

A) Training

  • Offline: WarpX → Neural Network
  • Online (in situ): advanced ML methods

B) Inference: in situ to codes

  • Zero-copy data access: persistently on GPU
  • Example: an ML map in beam dynamics

GPU Workflows are blazingly fast

  • PIC simulations
  • Machine learning

Can we augment & accelerate on-GPU�PIC simulations with on-GPU ML models?

Cross-Ecosystem, In Situ Coupling

Consortium for Python Data API Standards data-apis.org

Very easy to:

  • connect
  • vary ML models

19

20 of 32

Modeling Time: ML-Acceleration of Plasma Elements for Beamlines

LPA integration via AI/ML for rapid beamline design & operations.

Fast surrogates: Data-driven modeling is�a potential middle ground between

  • analytical modeling and
  • full-fidelity simulations.

LDRD

A Huebl (PI), R Sandberg,�R Lehe, CE Mitchell et al.

Trans-�port

Plasma Stage

Plasma Stage

Plasma Source

Injector

Model Speed: for accelerator elements

WarpX ImpactX WarpX HiPACE++ WarpX-ES

ML boosted: for a specific problem

ML ImpactX ML ML ML

  • start-to-end collider modeling
  • digital twin / ‘real-time’

Trans-�port

LWFA Stage

PWFA Stage

LWFA w/ iinj.

Kicker Magnet

A Huebl et al., NAPAC22, DOI:10.18429/JACoW-NAPAC2022-TUYE2 (2022)

RT Sandberg et al and A Huebl, IPAC23, DOI:10.18429/JACoW-IPAC-23-WEPA101 (2023)

A Huebl et al., AAC22, arXiv:2303.12873 (2023); RT Sandberg et al. and A Huebl, in preparation (2023)

Simulation time: full geometry, full physics

hrs sec hrs hrs min

Model Choice: for complex, nonlinear, many-body systems pick two of the following

level of detail

speed

accuracy

simulation

data-driven

analytical

20

21 of 32

We Trained a Neural Net with WarpX for Staging of Electrons

fast

precise

analytical

simulation

surrogate

Error of Beam Moments

combined beamline

stage 1

stage 2

Training data: 50,000 particles / beam

LDRD

A Huebl (PI), R Sandberg,�R Lehe, CE Mitchell et al.

A Huebl et al., NAPAC22, DOI:10.18429/JACoW-NAPAC2022-TUYE2 (2022)

RT Sandberg et al and A Huebl, IPAC23, DOI:10.18429/JACoW-IPAC-23-WEPA101 (2023)

A Huebl et al., AAC22, arXiv:2303.12873 (2023); RT Sandberg et al. and A Huebl, in preparation (2023)

one-time cost: few hr WarpX sim + 10min training

Lens

LWFA Stage 2

Drift …

LWFA�Stage 1

Drift

Drift

few pC

e- beam

Hyperparameters

  • 6D in 6D out
  • <10 layers with few 100s of nodes each are sufficient

A Neural Net is a non-linear�transfer map!

Assumption: purely tracking

A single NN can learn details of multiple stages (e.g, 10, 20, 30 GeV).

Assumption: laser-plasma parameters stay the same.

21

22 of 32

We Trained a Neural Net with WarpX for Staging of Electrons

fast

precise

analytical

simulation

surrogate

ImpactX: after 2 surrogates�WarpX: 2 stage simulation

Error of Beam Moments

combined beamline

stage 1

stage 2

Training data: 50,000 particles / beam

LDRD

A Huebl (PI), R Sandberg,�R Lehe, CE Mitchell et al.

A Huebl et al., NAPAC22, DOI:10.18429/JACoW-NAPAC2022-TUYE2 (2022)

RT Sandberg et al and A Huebl, IPAC23, DOI:10.18429/JACoW-IPAC-23-WEPA101 (2023)

A Huebl et al., AAC22, arXiv:2303.12873 (2023); RT Sandberg et al. and A Huebl, in preparation (2023)

Open challenges�Learning microscopic and collective�effects simultaneously.

one-time cost: few hr WarpX sim + 10min training

Lens

LWFA Stage 2

Drift …

LWFA�Stage 1

Drift

Drift

ImpactX simulation time: <1 sec

Flexible, Hybrid Beamline Sim

  • any 6D beam input
  • tune lens, transport, …
  • modify ML models

Same super-fast evaluation!

few pC

e- beam

22

23 of 32

Summary

  • BLAST is a fully open suite of PIC codes for particle accelerator�modeling, using code-sharing through libraries and leverage the�U.S. DOE Exascale software stack.
    • WarpX was our first Exascale app, for relativistic t-based�laser-plasma & beam modeling
    • ImpactX leverages these developments for s-based�beam dynamics.

presented by: Axel Huebl (LBNL)

📧 axelhuebl@lbl.gov

  • Vibrant Ecosystem and Contributions
    • Runs on any platform: Linux, macOS, Windows
    • Public development, automated testing, review & documentation
    • Friendly, open & helpful community

LDRD

github.com/ECP-WarpX

github.com/openPMD

github.com/AMReX-Codes

github.com/picmi-standard

level of detail

speed

accuracy

simulation

data-driven

analytical

  • Seamless, GPU-Accelerated Combination of PIC and AI/ML
    • zero-copy GPU data access: in situ models, application coupling
    • Scripted: easy to vary & research new data models

23

24 of 32

Backup Slides

24

25 of 32

Abstract (16'+4')

Computational modeling is essential to the exploration and design of advanced particle accelerators. The modeling of laser-plasma acceleration and interaction can achieve predictive quality for experiments if adequate resolution, full geometry and physical effects are included.

Here, we report on the significant evolution in fully relativistic full-3D modeling of conventional and advanced accelerators in the WarpX and ImpactX codes with the introduction of Exascale supercomputing and AI/ML models. We will cover the first PIC simulations on an Exascale machine, the need for and evolution of open standards, and based on our fully open community codes, the connection of time and space scales from plasma to conventional beamlines with data-driven machine-learning models.

25

26 of 32

WarpX in ECP: Staging of Laser-Driven Plasma Acceleration

Goal: deliver & scientifically use the nation’s first exascale systems

  • ExaFLOP: a quintillion (1018) calculations per second
  • ensure all the necessary pieces are concurrently in place

first 3D simulation of a chain of plasma accelerator stages for future colliders

Our DOE science case is in HEP, our methods are ASCR:

26

27 of 32

WarpX in ECP: Staging of Laser-Driven Plasma Acceleration

J.-L. Vay, A. Huebl et al., ISAV’20 Workshop Keynote (2020) and PoP 28.2, 023105 (2021); L. Fedeli, A. Huebl et al., SC22 (2022)�J.-L. Vay et al., ECP WarpX MS FY23.1; A. Ferran Pousa et al., IPAC23, DOI:10.18429/JACoW-IPAC-23-TUPA093 (2023)

next

First-of-their-kind platforms: NERSC (Intel, then Nvidia)→Exascale: OLCF (AMD), ALCF (Intel)

Ascent VTK-m In Situ Visualization:�N. Marsaglia, C. Harrison, A. Huebl

27

28 of 32

BLAST is Now An Accelerated, ML-Modeling Ecosystem

fields & particles

tensors arrays

LDRD

A Huebl (PI), R Sandberg,�R Lehe, CE Mitchell et al.

A Huebl et al., NAPAC22, DOI:10.18429/JACoW-NAPAC2022-TUYE2 (2022)

RT Sandberg et al and A Huebl, IPAC23, DOI:10.18429/JACoW-IPAC-23-WEPA101 (2023)

A Huebl et al., AAC22, arXiv:2303.12873 (2023); RT Sandberg et al. and A Huebl, in preparation (2023)

A) Training

  • Offline: WarpX → Neural Network
  • Online (in situ): advanced ML methods

B) Inference: in situ to codes

  • Zero-copy data access: persistently on GPU
  • Example: an ML map in beam dynamics

C Badiali et al., JPlasmaPhys. 88.6 (2022)

Related Works: Not or only partly GPU accelerated

  • bottlenecks in host-device I/O, slower
  • quality of prediction

Cross-Ecosystem, In Situ Coupling

Consortium for Python Data API Standards data-apis.org

Very easy to:

  • connect
  • vary to other models

All-GPU Workflows are blazingly fast

  • PIC simulations
  • ML Models

Can we augment & accelerate on-GPU�PIC simulations with on-GPU ML models?

28

29 of 32

The WarpX Software Stack

WarpX

full PIC, LPA/LPI

AMReX

Containers, Communication,�Portability, Utilities

MPI

CUDA, OpenMP, SYCL, HIP

Diagnostics

I/O�code coupling

ADIOS2

HDF5

Lin.�Alg.

BLAS++�LAPACK++

Ascent

...

Python: Modules, PICMI interface, Workflows

ZFP

VTK-m

openPMD

PICSAR

QED Modules

FFT

on- or multi- device

ABLASTR library: common PIC physics

ARTEMIS

microelectronics

ImpactX

accelerator lattice design

Desktop

to

HPC

HiPACE++quasi-static, PWFA

Object-Level Python Bindings

extensible, AI/ML

pyAMReX

29

30 of 32

Power-Limits Seed a Cambrian Explosion of Compute Architectures

AMD

ARM

30

31 of 32

Portable Performance through Exascale Programming Model

A. Myers et al., “Porting WarpX to GPU-accelerated platforms,” Parallel Computing 108, 102833 (2021)

AMReX library�

    • Domain decomposition & MPIcommunications: MR & load balance������
    • Performance-Portability Layer: GPU/CPU/KNL

without tiling

with tiling

Data Structures

  • Write the code once, specialize at�compile-time

ParallelFor(/Scan/Reduce)

  • Parallel linear solvers�(e.g. multi-grid Poisson solvers)
  • Embedded�boundaries

  • Runtime parser for user-provided math expressions (incl. GPU)

A100 gives additional ~< 2x

31

32 of 32

BLAST Codes: Transition to Exascale

Imagine a future, hybrid particle accelerator, e.g., with conventional and plasma elements.

s-based PICuses s instead of t as�independent variable�+ symplectic maps for �accelerator elements

Quasistatic PIC�separates timescale:�plasma wake & beam evl.

WarpX

HiPACE++

ImpactX

Booster

Source

Injector

Storage Ring

BeamBeam3D

ES or

Vlasov

FEL

IMPACT-T

Legend

BLAST: Exascale

in BLAST

LW3D

modeling of radiative & space-charge effects

POSINST

buildup of electron clouds, secondary electron yield

other

Injector

Plasma Stage

(S)RF Gun

LPA/LPI

Storage Ring

IP

IP

IMPACT-Z

cooling

Goal

Start-to-end model-�ing in an open software ecosystem.

Plasma Stage

t-based electrostatic �or electromagnetic PIC

Warp

FBPIC

Wake-T

Reduce Dynamics / Geometry