1 of 20

Data staging and caching challenges in the terabit/s era

Maiken Pedersen�& Mattias Wadenstein

2026-03-17

CS3 Conference

Oslo, Norway

2 of 20

Overview

CERN and data
Nordic setup
Storage
Computing
LHC upgrade 🡪more data
Conclusion

SPEAKER | Maiken Pedersen

3 of 20

The Large Hadron Collider at CERN

Large: 27 km circumference
Hadron: The type of particles accelerated in it
Collider: It smashes them together

SPEAKER | Maiken Pedersen

4 of 20

LHC Experiments

Big complex camera-equivalents

ALICE weighs 10000 tons and ATLAS weighs 7000 tons
Billions of individual sensors of various kinds

Taking snapshots of each collision, at 40 MHz (or every 25 nano seconds)

Each snapshot is a few megabytes → masses of data

SPEAKER | Maiken Pedersen

5 of 20

Data flows

Hierarchy of sites:

Tier-0: CERN, experiments, custodial storage, active storage, CPU
Tier-1: A dozen sites, custodial storage, active storage, CPU
Tier-2: A hundred sites, active storage and CPU

Data flows between sites

Export T0→T1→T2
Data movement between sites
Writing and consolidating outputs
Recovery of lost files via replicas
Etc, etc

DC24 ATLAS pic, 25% scale:

SPEAKER | Maiken Pedersen

6 of 20

Computing

LHC computing is data intensive

Caches used for latency hiding and ~50% bandwidth reduction
2 GBytes/s here →
Representing about 10% of current Nordic capacity
Green: Data staged in
Blue: Data read by job

ATLAS, ca 4k cores

3.5 GB/s

2.0 GB/s

SPEAKER | Maiken Pedersen

7 of 20

WLCG and the Nordic Setup

Distributed Tier-1 site

Storage and compute in 6 sites in the Nordics
Nordic collaboration starting 2002 to become one Tier-1 site, each single country too small

Associated Tier-2 sites in Slovenia and Switzerland

We integrate the storage for user convenience

Network

dCache is used to manage the distributed storage

SPEAKER | Maiken Pedersen

8 of 20

Size

One distributed dCache namespace with

10 PB ALICE disk
27 PB ATLAS disk
45 PB tape for both experiments

Serving 20k-200k cores compute with Nordugrid ARC

Tier-1 compute in the Nordics
And associated tier-2 sites, which includes:

Backfill on EuroHPC Vega, sometimes very many cores available

SPEAKER | Maiken Pedersen

9 of 20

Compute site 1

Compute site N

…

CERN

Researcher

Nordugrid ARC

Compute middleware

SPEAKER | Maiken Pedersen

10 of 20

Computing: ARC with datastaging

ARC-CE can do data staging

Prepares all input files needed by the job before submission to batch system
Saves all requested outputs to remote storage afterwards
Cache for reuse of input files between jobs

SPEAKER | Maiken Pedersen

11 of 20

Computing: ARC with datastaging

ARC in data caching mode

Each job description has a list of input and output files (rucio://...)
The ARC CE stages all these files to local cache and links them in the session directory
The job is submitted to the local batch system and runs on local files only
Afterwards the listed output files are uploaded to main storage
Transfers over https, so same path as data movement

Caches are normal shared filesystems

NFS, CephFS, GPFS, Lustre, etc
Size reasonable for SSD for ATLAS: 20TB + 5TB/1kcore

SPEAKER | Maiken Pedersen

12 of 20

Computing: ARC with datastaging

Overall efficiency

Data access is on low-latency local filesystems
Download before submission to batch system → better CPU efficiency
E.g. 47% → 90% CPU efficiency [M Pedersen, CHEP 2019]

Enables computing with non-local storage

Like NDGF-T1 with distributed storage
Or a “compute only” site

Possible to run with limited external connectivity

Like HPC sites where external connectivity might be blocked or only available through a slow NAT

SPEAKER | Maiken Pedersen

13 of 20

A Hexagonal Federation

Staging makes ARC location agnostic
Setting to prefer “local” (T1) data
No problem getting some data to/from other sites
Fast internal network to keep CPUs full

DISK

SPEAKER | Maiken Pedersen

14 of 20

High-Luminosity upgrade

2026-2030 the LHC will be upgraded

Plan as of today: In production by 2030-08

High-Luminosity LHC, or HL-LHC
Experiment upgrade for ATLAS (and CMS) will create 10-12 times as high data rates

ALICE has already upgraded their detector, but most of the higher data rate is absorbed at CERN and exported to Tier-1s in a slow trickle

CERN is preparing for 4.8Tbit/s data export

The Nordic share is (only) about 180 Gbit/s of that

SPEAKER | Maiken Pedersen

15 of 20

Data storage needs

Note: These graphs are from 2022, the start of HL-LHC has shifted from 2029 to 2030

SPEAKER | Maiken Pedersen

16 of 20

Computing needs

Projected CPU requirements
Blue and red depends�on software devel
More efficient code�probably needs the�same input data
Collision → Paper�pipeline will get�10-12x wider

SPEAKER | Maiken Pedersen

17 of 20

Network needs for compute

Still assuming a distributed mesh network
If we need 100 Gbit/s per subsite now we need 1TBit/s for High Lumi LHC

40 Gbit/s is too slow today
400 Gbit/s will be too slow 2030
Will 800 Gbit/s be enough? Maybe.

Main storage will have to scale

Size, 30 PB of disk per site → 30 servers
Speed, 30 Gbit/s per server → 30 servers

Compute cache too, fast SSDs

8-12 x 100 Gbit/s cache servers?

Network

SPEAKER | Maiken Pedersen

18 of 20

Components

Reliable

Cheap

Fast

dCache pools

ARC Cache

“Choose at most two”

Funding agencies requires “cheap”
Main long-term storage reliable & cheap
Cache fast & cheap

Failure just�breaks currently�running jobs

SPEAKER | Maiken Pedersen

19 of 20

Conclusion

Low-latency local caches are essential to compute efficiency on data-intensive loads
Not requiring both high performance and high reliability makes it possible to buy cheap storage
Horizontal scaling makes a big increase in data rates possible

SPEAKER | Maiken Pedersen

20 of 20

Questions?