1 of 20

Data staging and caching challenges in the terabit/s era

Maiken Pedersen& Mattias Wadenstein

2026-03-17

CS3 Conference

Oslo, Norway

2 of 20

Overview

  • CERN and data
  • Nordic setup
  • Storage
  • Computing
  • LHC upgrade 🡪more data
  • Conclusion

SPEAKER | Maiken Pedersen

2

3 of 20

The Large Hadron Collider at CERN

  • Large: 27 km circumference
  • Hadron: The type of particles accelerated in it
  • Collider: It smashes them together

SPEAKER | Maiken Pedersen

3

4 of 20

LHC Experiments

  • Big complex camera-equivalents
    • ALICE weighs 10000 tons and ATLAS weighs 7000 tons
    • Billions of individual sensors of various kinds
  • Taking snapshots of each collision, at 40 MHz (or every 25 nano seconds)
    • Each snapshot is a few megabytes → masses of data

SPEAKER | Maiken Pedersen

4

5 of 20

Data flows

  • Hierarchy of sites:
    • Tier-0: CERN, experiments, custodial storage, active storage, CPU
    • Tier-1: A dozen sites, custodial storage, active storage, CPU
    • Tier-2: A hundred sites, active storage and CPU
  • Data flows between sites
    • Export T0→T1→T2
    • Data movement between sites
    • Writing and consolidating outputs
    • Recovery of lost files via replicas
    • Etc, etc
  • DC24 ATLAS pic, 25% scale:

SPEAKER | Maiken Pedersen

5

6 of 20

Computing

  • LHC computing is data intensive
    • Caches used for latency hiding and ~50% bandwidth reduction
    • 2 GBytes/s here →
    • Representing about 10% of current Nordic capacity
    • Green: Data staged in
    • Blue: Data read by job

ATLAS, ca 4k cores

3.5 GB/s

2.0 GB/s

SPEAKER | Maiken Pedersen

6

7 of 20

WLCG and the Nordic Setup

  • Distributed Tier-1 site
    • Storage and compute in 6 sites in the Nordics
    • Nordic collaboration starting 2002 to become one Tier-1 site, each single country too small
  • Associated Tier-2 sites in Slovenia and Switzerland
    • We integrate the storage for user convenience

Network

dCache is used to manage the distributed storage

SPEAKER | Maiken Pedersen

7

8 of 20

Size

  • One distributed dCache namespace with
    • 10 PB ALICE disk
    • 27 PB ATLAS disk
    • 45 PB tape for both experiments
  • Serving 20k-200k cores compute with Nordugrid ARC
    • Tier-1 compute in the Nordics
    • And associated tier-2 sites, which includes:
                  • Backfill on EuroHPC Vega, sometimes very many cores available

SPEAKER | Maiken Pedersen

8

9 of 20

Compute site 1

Compute site N

CERN

Researcher

Nordugrid ARC

Compute middleware

SPEAKER | Maiken Pedersen

9

10 of 20

Computing: ARC with datastaging

  • ARC-CE can do data staging
    • Prepares all input files needed by the job before submission to batch system
    • Saves all requested outputs to remote storage afterwards
    • Cache for reuse of input files between jobs

SPEAKER | Maiken Pedersen

10

11 of 20

Computing: ARC with datastaging

  • ARC in data caching mode
    • Each job description has a list of input and output files (rucio://...)
    • The ARC CE stages all these files to local cache and links them in the session directory
    • The job is submitted to the local batch system and runs on local files only
    • Afterwards the listed output files are uploaded to main storage
    • Transfers over https, so same path as data movement
  • Caches are normal shared filesystems
    • NFS, CephFS, GPFS, Lustre, etc
    • Size reasonable for SSD for ATLAS: 20TB + 5TB/1kcore

SPEAKER | Maiken Pedersen

11

12 of 20

Computing: ARC with datastaging

  • Overall efficiency
    • Data access is on low-latency local filesystems
    • Download before submission to batch system → better CPU efficiency
    • E.g. 47% → 90% CPU efficiency [M Pedersen, CHEP 2019]
  • Enables computing with non-local storage
    • Like NDGF-T1 with distributed storage
    • Or a “compute only” site
  • Possible to run with limited external connectivity
    • Like HPC sites where external connectivity might be blocked or only available through a slow NAT

SPEAKER | Maiken Pedersen

12

13 of 20

A Hexagonal Federation

  • Staging makes ARC location agnostic
  • Setting to prefer “local” (T1) data
  • No problem getting some data to/from other sites
  • Fast internal network to keep CPUs full

DISK

CE

CE

CE

CE

DISK

DISK

DISK

SPEAKER | Maiken Pedersen

13

14 of 20

High-Luminosity upgrade

  • 2026-2030 the LHC will be upgraded
    • Plan as of today: In production by 2030-08
  • High-Luminosity LHC, or HL-LHC
  • Experiment upgrade for ATLAS (and CMS) will create 10-12 times as high data rates
    • ALICE has already upgraded their detector, but most of the higher data rate is absorbed at CERN and exported to Tier-1s in a slow trickle
  • CERN is preparing for 4.8Tbit/s data export
    • The Nordic share is (only) about 180 Gbit/s of that

SPEAKER | Maiken Pedersen

14

15 of 20

Data storage needs

  • Note: These graphs are from 2022, the start of HL-LHC has shifted from 2029 to 2030

SPEAKER | Maiken Pedersen

15

16 of 20

Computing needs

  • Projected CPU requirements
  • Blue and red dependson software devel
  • More efficient codeprobably needs thesame input data
  • Collision → Paperpipeline will get10-12x wider

SPEAKER | Maiken Pedersen

16

17 of 20

Network needs for compute

  • Still assuming a distributed mesh network
  • If we need 100 Gbit/s per subsite now we need 1TBit/s for High Lumi LHC
    • 40 Gbit/s is too slow today
    • 400 Gbit/s will be too slow 2030
    • Will 800 Gbit/s be enough? Maybe.
  • Main storage will have to scale
    • Size, 30 PB of disk per site → 30 servers
    • Speed, 30 Gbit/s per server → 30 servers
  • Compute cache too, fast SSDs
    • 8-12 x 100 Gbit/s cache servers?

Network

SPEAKER | Maiken Pedersen

17

18 of 20

Components

Reliable

Cheap

Fast

dCache pools

ARC Cache

“Choose at most two”

  • Funding agencies requires “cheap”
  • Main long-term storage reliable & cheap
  • Cache fast & cheap
    • Failure justbreaks currentlyrunning jobs

SPEAKER | Maiken Pedersen

18

19 of 20

Conclusion

  • Low-latency local caches are essential to compute efficiency on data-intensive loads
  • Not requiring both high performance and high reliability makes it possible to buy cheap storage
  • Horizontal scaling makes a big increase in data rates possible

SPEAKER | Maiken Pedersen

19

20 of 20

Questions?

20