1 of 16

A proposal for a EuroHPC Center of Excellence

2 of 16

EuroHPC Centres of Excellence Calls 

  • More details here

EuroHPC Centres of Excellence Calls 

  • More details here

3 of 16

Aiming at a  “community” CoE 

  • A EuroHPC Center of Excellence (see here) “must demonstrate scientific and technical excellence while ensuring impact at the wider European HPC community including the European industry and/or academia”
  • There are three type on CoEs in this call:
    • Community CoEs: focused on the needs of a given community (would be HEP in our case), to elevate their codes to a better / more efficient / possible use on EuroHPC JU systems.
      • 2-4 MEur as EC contribution (at 50%, so project “cost” 4-8 MEur); ~10 expected
    • Transversal CoEs: focused on a technological aspect (for example, workload distribution) which may be relevant to more than one scientific community.
      • 1-2.5 MEur as EC contribution (at 50%, so project “cost” 2-5MEur); ~4 expected
    • Lighthouse Codes: mature codes which are production-ready, ready to be elevated as common tools via the integration of documentation, reproducibility studies, eventually to the industry. Examples can be software stacks like Gromacs, which are transitioning from academia-only to potential industrial utilization (drug discovery, etc).
      • 1-1.5 MEur as EC contribution (at 50%, so project “cost” 2-3 MEur); ~8 expected
  • Some examples of existing CoEs from the 2022 call
    • Materials design at the eXascale (here)
    • Scalable Parallel and distributed Astrophysical Codes for Exascale (here)
    • Center of Excellence for Exascale CFD (here)
    • Plasma Exascale-Performance Simulations CoE - Pushing flagship plasma simulations codes to tackle exascale-enabled Grand Challenges via performance optimisation and codesign (here)

4 of 16

Why a Centre of Excellence (CoE) for HEP

A diverse and capable community

  • 10,000 users spanning theory (e.g. Lattice QCD) to experiments
  • Proven ability to exploit large-scale computing and project future order-of-magnitude growth
  • Strong links with major EU, US, and Asian HPC centres

Strategic drivers

  • Increasing reliance on HPC and hardware accelerators (GPUs, FPGAs, TPUs)
  • Funding agencies encouraging greater use of external HPC resources
  • Emerging EuroHPC Federated Platform (EFP) and US IRI-DOE efforts standardizing access

Opportunity for leadership

  • HEP faces AI-scale data and bandwidth challenges, making it an ideal testbench for AI infrastructures
  • Acting as a coordinated community enhances influence on developing HPC/AI platforms
  • HEP’s scale, complexity, and technical excellence position it as a key partner in EuroHPC and beyond — not just a user, but a co-designer and contributor

We prepared a 2 pages doc: HERE

5 of 16

Forming a competitive consortium 

  • General ideas
    • Possibly led by CERN – a strong message from the community!
    • CoEs are “code based”: we need to expose a list of (HPC) codes and propose their improvement
    • We as HEP are interested in a larger integration with HPCs, including:
      • Data management, Access, Workload management, …

  • TH and/or EXP?
    • TH (especially lattice) is very advanced in the utilization of HPC systems, with multi node codes + GPU utilization etc)
      • We can call them “high TRL”
    • EXP is less advanced, but there is a list of codes which explore the directions we are interested in
      • (remote) GPUs
      • Data intensive processing
      • Distributed workflows
      • AI workloads
  • We should have both!

6 of 16

Current Status

  • Members:
    • On board and “understood”:
      • CERN, INFN, Neovia, JSI, ICSC, E4, Dublin, IN2P3, PIC 
    • Under discussion
      • DESY, SURF, BSC, Juelich, SDU, GENCI
    • Associate partners/Collaborations 
      • EFP, under discussion
      • Ongoing discussion with Numpex, TGCC-CEA
  • A lot of members are willing to participate at 0 Eur (!)  

  • Budget:
    • 50% rate is low
    • Some countries _can_ give matching funds
  • Italy:
    • Matching funds in principle available (MIMIT) *if* there is a SME
    • We have “chosen” E4 as the referent
  • Italy:
    • INFN, ICSC, E4
    • ICSC includes CINECA and UNIMIB + we will see …

7 of 16

Possible WP Structure

code / subdomain / ..

WP1: project management, dissemination and outreach, collaboration and connection to other projects

WP4: architecture testbeds and sustainability

(PoCs, energy studies, arch validation, benchmarking ..)

[specific to new archs, the hpc part sits in wp2]

WP3: enabling technologies (DM, WMS, AAI, software distribution, …)

(domain experts + HPC experts)

WP2: scientific codes (including AI codes)

(domain experts + HPC/AI experts)

code / subdomain / ..

code / subdomain / ..

code / subdomain / ..

code / subdomain / ..

code / subdomain / ..

HPC scalability experts

HPC performance/porting experts

HPC application domain experts

8 of 16

Possible WPs and Tasks

WP1 (PM):

  • T1.1:PM
  • T1.2: Technical Coordination (including Q/A + Risk + IP… )
  • T1.3: Dissemination and outreach
  • T1.4: Collaboration and connection to other projects (including scientific boards, CoPs, industry, external stakeholders)

WP2 (Scientific codes and performance on HPC):

  • T2.1: TH codes (LQCD)
  • T2.2: Common Codes (e.g. AdePT)
  • T2.3: EXP code(includes AI)
  • T2.4: other codes..
  • T2.5: Cross-HPC application domain / infrastructure domain experts (scalability, portability, performance optimization)

WP3 (Enabling technologies for HPC integration in Science and development of interfaces to EFP):

  • T3.1: Data management and data access
  • T3.2: Workload management (slurm, etc, …)
  • T3.3: Access to resources (AAI, software distribution, …)
  • T3.4: AI tools on HPC: distributed training, HPO, etc…
  • T3.5: Interfaces to Architecture Testbeds and Demonstrators with Data Intensive Science (HEP, RA,.. )

WP4 (Architecture co-design, testbeds and sustainability):

  • T4.1: Hardware testbeds (RISC-V, ARM, AI specialized architectures), and testbeds from HPC centers
  • T4.2: validation and benchmarking on provided testbeds
  • T4.3: sustainability and environmental optimizations

9 of 16

The status of the proposal

  • WP leaders
    • WP1: naturally CERN (Maria Girone)
    • WP2: it is the biggest WP, we propose (up to) 1 leaders + 2 co-leaders: TH/EXP/HPC
    • WP3: Daniele Spiga (INFN) proposed (expert in EXP_HEP integration, also INTERTWIN project)
    • WP4: need a name (or two: industrial + HPC?)
  • Task leaders
    • We need collecting interest: table

  • Still, if you have names to propose, please write coe-hep-core@cern.ch

10 of 16

11 of 16

12 of 16

Discussions with the experiments on the codes

  • CMS (computing coordinators + A. di Florio + A. Bocci)
    • Interest in using MLPF (large scale AI) as a flagship code
      • Openlab supports it, it would be a fellow @ CERN
    • Interest in probing the possibility to test remote access to GPUs
      • 2nd prio
  • ATLAS (computing coordinators)
    • Interest in the simulation (Geant4) and in particular in the porting to GPUs (AdePT)
  • LHCb (computing coordinators + C. Bozzi)
    • Interest in the simulation (Geant4) and in particular in the porting to GPUs (AdePT)
    • (interest also in heavily multithreaded CPU simulation)
  • ALICE (computing coordinator + S. Piano)
    • Interest in executing O2 (CPU+GPU) on HPC systems
  • LatticeQCD: Sinead, Leonardo:
    • Selected one EU lattice code + 2 candidates for a possible second code

13 of 16

The status of the proposal

  • The proposal
  • Excellence:
    • In advanced state of writing; biggest missing part is the CODES TABLE
    • Neovia will take a look to the Objectives, but please everyone do it!
    • I assigned some specific parts (mostly science related) via Gdocs
  • Impact:
    • … later …
  • Implementation
    • Some general parts being written (risks, …)
    • Prepared the WP table, to be filled by WP leaders

14 of 16

Persone infn coinvolte

  • Io come PI INFN
  • Daniele Spiga come WP3 leader
  • Lato exp:
    • Concezio per LHCb
    • Stefano Piano per ALICE
  • Financial Officer Simona Petronici (Pisa)

15 of 16

Italia e INFN

  • 500kE su INFN (diciamo 250kE se non facciamo affidamento su MIMIT)
    • Comunque dobbiamo rendicontare per 500kE
    • La proposta in discussion e’ una persona WP3 + una probabilmente 1 o 2 o Lattice o Exp
  • 500kE (250kE) su ICSC
    • Maggior parte su UNIMIB → Lattice
    • CINECA non pare voglia troppi soldi
    • ICSC qualcosa (poco) su dissemination
  • E4 300kE (150kE)
    • Macchine RISC-V e test di sostenibilità’ energetica

16 of 16

  • … ora basta che qualcuno lo scriva ☹