3 of 16

Aiming at a “community” CoE

�
A EuroHPC Center of Excellence (see here) “must demonstrate scientific and technical excellence while ensuring impact at the wider European HPC community including the European industry and/or academia”
�
There are three type on CoEs in this call:

Community CoEs: focused on the needs of a given community (would be HEP in our case), to elevate their codes to a better / more efficient / possible use on EuroHPC JU systems.

2-4 MEur as EC contribution (at 50%, so project “cost” 4-8 MEur); ~10 expected

Transversal CoEs: focused on a technological aspect (for example, workload distribution) which may be relevant to more than one scientific community.

1-2.5 MEur as EC contribution (at 50%, so project “cost” 2-5MEur); ~4 expected

Lighthouse Codes: mature codes which are production-ready, ready to be elevated as common tools via the integration of documentation, reproducibility studies, eventually to the industry. Examples can be software stacks like Gromacs, which are transitioning from academia-only to potential industrial utilization (drug discovery, etc).

1-1.5 MEur as EC contribution (at 50%, so project “cost” 2-3 MEur); ~8 expected

�
Some examples of existing CoEs from the 2022 call:

Materials design at the eXascale (here)
Scalable Parallel and distributed Astrophysical Codes for Exascale (here)
Center of Excellence for Exascale CFD (here)
Plasma Exascale-Performance Simulations CoE - Pushing flagship plasma simulations codes to tackle exascale-enabled Grand Challenges via performance optimisation and codesign (here)

4 of 16

Why a Centre of Excellence (CoE) for HEP�

A diverse and capable community

10,000 users spanning theory (e.g. Lattice QCD) to experiments
Proven ability to exploit large-scale computing and project future order-of-magnitude growth
Strong links with major EU, US, and Asian HPC centres

Strategic drivers

Increasing reliance on HPC and hardware accelerators (GPUs, FPGAs, TPUs)
Funding agencies encouraging greater use of external HPC resources
Emerging EuroHPC Federated Platform (EFP) and US IRI-DOE efforts standardizing access

Opportunity for leadership

HEP faces AI-scale data and bandwidth challenges, making it an ideal testbench for AI infrastructures
Acting as a coordinated community enhances influence on developing HPC/AI platforms
HEP’s scale, complexity, and technical excellence position it as a key partner in EuroHPC and beyond — not just a user, but a co-designer and contributor

We prepared a 2 pages doc: HERE

�

5 of 16

Forming a competitive consortium �

General ideas

Possibly led by CERN – a strong message from the community!
CoEs are “code based”: we need to expose a list of (HPC) codes and propose their improvement
We as HEP are interested in a larger integration with HPCs, including:

Data management, Access, Workload management, …�

TH and/or EXP?

TH (especially lattice) is very advanced in the utilization of HPC systems, with multi node codes + GPU utilization etc)

We can call them “high TRL”

EXP is less advanced, but there is a list of codes which explore the directions we are interested in

(remote) GPUs
Data intensive processing
Distributed workflows
AI workloads

We should have both!

�

6 of 16

Current Status

Members:

On board and “understood”:

CERN, INFN, Neovia, JSI, ICSC, E4, Dublin, IN2P3, PIC

Under discussion

DESY, SURF, BSC, Juelich, SDU, GENCI

Associate partners/Collaborations

EFP, under discussion
Ongoing discussion with Numpex, TGCC-CEA

A lot of members are willing to participate at 0 Eur (!)

Budget:

50% rate is low
Some countries _can_ give matching funds

Italy:

Matching funds in principle available (MIMIT) *if* there is a SME
We have “chosen” E4 as the referent

Italy:

INFN, ICSC, E4
ICSC includes CINECA and UNIMIB + we will see …

7 of 16

Possible WP Structure

code / subdomain / ..

WP1: project management, dissemination and outreach, collaboration and connection to other projects

WP4: architecture testbeds and sustainability

(PoCs, energy studies, arch validation, benchmarking ..)

[specific to new archs, the hpc part sits in wp2]

WP3: enabling technologies (DM, WMS, AAI, software distribution, …)

(domain experts + HPC experts)

WP2: scientific codes (including AI codes)

(domain experts + HPC/AI experts)

code / subdomain / ..

HPC scalability experts

HPC performance/porting experts

HPC application domain experts

8 of 16

Possible WPs and Tasks

WP1 (PM):

T1.1:PM
T1.2: Technical Coordination (including Q/A + Risk + IP… )
T1.3: Dissemination and outreach
T1.4: Collaboration and connection to other projects (including scientific boards, CoPs, industry, external stakeholders)

WP2 (Scientific codes and performance on HPC):

T2.1: TH codes (LQCD)
T2.2: Common Codes (e.g. AdePT)
T2.3: EXP code(includes AI)
T2.4: other codes..
T2.5: Cross-HPC application domain / infrastructure domain experts (scalability, portability, performance optimization)

WP3 (Enabling technologies for HPC integration in Science and development of interfaces to EFP):

T3.1: Data management and data access
T3.2: Workload management (slurm, etc, …)
T3.3: Access to resources (AAI, software distribution, …)
T3.4: AI tools on HPC: distributed training, HPO, etc…
T3.5: Interfaces to Architecture Testbeds and Demonstrators with Data Intensive Science (HEP, RA,.. )

WP4 (Architecture co-design, testbeds and sustainability):

T4.1: Hardware testbeds (RISC-V, ARM, AI specialized architectures), and testbeds from HPC centers
T4.2: validation and benchmarking on provided testbeds
T4.3: sustainability and environmental optimizations

9 of 16

The status of the proposal

WP leaders

WP1: naturally CERN (Maria Girone)
WP2: it is the biggest WP, we propose (up to) 1 leaders + 2 co-leaders: TH/EXP/HPC
WP3: Daniele Spiga (INFN) proposed (expert in EXP_HEP integration, also INTERTWIN project)
WP4: need a name (or two: industrial + HPC?)

Task leaders

We need collecting interest: table

Still, if you have names to propose, please write coe-hep-core@cern.ch

12 of 16

Discussions with the experiments on the codes

CMS (computing coordinators + A. di Florio + A. Bocci)

Interest in using MLPF (large scale AI) as a flagship code

Openlab supports it, it would be a fellow @ CERN

Interest in probing the possibility to test remote access to GPUs

2^nd prio

ATLAS (computing coordinators)

Interest in the simulation (Geant4) and in particular in the porting to GPUs (AdePT)

LHCb (computing coordinators + C. Bozzi)

Interest in the simulation (Geant4) and in particular in the porting to GPUs (AdePT)
(interest also in heavily multithreaded CPU simulation)

ALICE (computing coordinator + S. Piano)

Interest in executing O2 (CPU+GPU) on HPC systems

LatticeQCD: Sinead, Leonardo:

Selected one EU lattice code + 2 candidates for a possible second code

13 of 16

The status of the proposal

The proposal
Excellence:

In advanced state of writing; biggest missing part is the CODES TABLE
Neovia will take a look to the Objectives, but please everyone do it!
I assigned some specific parts (mostly science related) via Gdocs

Impact:

… later …

Implementation

Some general parts being written (risks, …)
Prepared the WP table, to be filled by WP leaders

14 of 16

Persone infn coinvolte

Io come PI INFN
Daniele Spiga come WP3 leader
Lato exp:

Concezio per LHCb
Stefano Piano per ALICE

Financial Officer Simona Petronici (Pisa)

15 of 16

Italia e INFN

500kE su INFN (diciamo 250kE se non facciamo affidamento su MIMIT)

Comunque dobbiamo rendicontare per 500kE
La proposta in discussion e’ una persona WP3 + una probabilmente 1 o 2 o Lattice o Exp

500kE (250kE) su ICSC

Maggior parte su UNIMIB → Lattice
CINECA non pare voglia troppi soldi
ICSC qualcosa (poco) su dissemination

E4 300kE (150kE)

Macchine RISC-V e test di sostenibilità’ energetica

1 of 16

2 of 16