1 of 24

ESCAPE Composability scenario

Giovanni Guerrieri, Enrique Garcia (CERN)

Alberto Iess, Marion Pierre, Leo Chazallet (LAPP)

for the ESCAPE Cluster

September 18th, 2025

2 of 24

ESCAPE: European Science Cluster of Astronomy and Particle Physics

Consortium of 31 members, including:

  • 10 ESFRI projects & landmarks: CTAO, EST, FAIR, HL-LHC, KM3NeT, SKAO, LSST, VIRGO, ESO, JIVE
  • 2 pan-European International Organizations: CERN and ESO
  • 2 European Research Infrastructures: EGO and JIV-ERIC
  • 4 supporting European consortia: APPEC, ASTRONET, ECFA and NuPECC

Budget: 15.98 M€

Duration: 48 months (1/2/2019 -31/1/2023)

ESCAPE has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement no. 824064.

ESCAPE is now continuing as an Open Collaboration!

3 of 24

Previously on this channel

DESY workshop refresher: Services and Data Sources Portfolio @ ESCAPE

The ESCAPE VRE

  • AAI: A federated and reliable Authentication and Authorization layer
  • The Rucio Data Lake: A federated distributed storage solution, providing functionalities for data injection and replication through a Data Management framework.�Powered by FTS.
  • REANA: A computing cluster supplying the processing power to run full analyses with REANA, a re-analysis service
  • CVMFS: A read-only file system designed to distribute software, and more.
  • JupyterHub: A notebook interface with containerised environments to hide the infrastructure’s complexity from the user.

4 of 24

Before

After

  • Rucio is a data management system meant to provide a unified way to manage data distributed across numerous, geographically separate, and technologically different storage systems.
  • It was born in the ATLAS experiment at CERN; now it is adopted by several communities in HEP and beyond.
  • Rucio is declarative: users can say what they want, and let Rucio figure out the details how to do it.

Creation, transfer, and deletion of replicas of data

“I would like to delete all the datasets created for the past campaign hosted at DESY, but keep the ones at CERN. Then copy the ones at CERN to DESY.”

Management policies (usage, access, and data lifetime)

“I would like all the HIGGS samples to be read-only for users in this group, and I would like the replicas hosted at DESY to have a lifetime of 180 days.”

Workflow management systems integration (e.g. REANA)

“Whenever REANA writes outputs in this directory, I want to register these files in Rucio, keep 1 replica at CERN and have 2 replicas in Italy.”

Automation of large-scale and repetitive operational tasks

“I would like to delete all files that haven’t been accessed in over 5 months.”

5 of 24

Before

After

  • Rucio is a data management system meant to provide a unified way to manage data distributed across numerous, geographically separate, and technologically different storage systems.
  • It was born in the ATLAS experiment at CERN; now it is adopted by several communities in HEP and beyond.
  • Rucio is declarative: users can say what they want, and let Rucio figure out the details how to do it.

6 of 24

REANA

7 of 24

Composability Scenario 1: ATLAS Open Data

Wan to to know more? Join tomorrow’s visit to ATLAS

  • The LHC accelerates (not only) hydrogen nuclei close to the speed of light
  • Bunches of protons collide in correspondence of the detectors
  • The ATLAS detector records 40 million proton-proton collisions per second.

8 of 24

Composability Scenario 1: ATLAS Open Data

  • The LHC accelerates (not only) hydrogen nuclei close to the speed of light
  • Bunches of protons collide in correspondence of the detectors
  • The ATLAS detector records 40 million proton-proton collisions per second.
  • Each collision is a complex mix of thousands of new particles being produced
  • We analyse the characteristics of these collisions to understand how physics works at subatomic scale (and not only)
  • We have a good idea of what happens theoretically during the collisions

Wan to to know more? Join tomorrow’s visit to ATLAS

9 of 24

Composability Scenario 1: ATLAS Open Data

  • In practice, the detector only records a part of the physics interaction
  • We then need to simulate the events (according to our theory), and compare them with the output of our detectors
  • ATLAS simulates up to ~200B events per year (O(10)PB of data), to be used for analyses
    • Other experiments follow the same strategy

Wan to to know more? Join tomorrow’s visit to ATLAS

10 of 24

Composability Scenario 1: ATLAS Open Data

  • In practice, the detector only records a part of the physics interaction
  • We then need to simulate the events (according to our theory), and compare them with the output of our detectors
  • ATLAS simulates up to ~200B events per year (O(10)PB of data), to be used for analyses
    • Other experiments follow the same strategy
  • Part of these data is detector-independent, i.e. not specific to ATLAS
  • We decided to open a fraction of these data to the public, to reduce the waste on computing resources, and facilitate the collaboration with external researchers, and other experiments

GENERATION

Event generator

output (EVNT)

Simulated interaction

with detector (HITS)

Simulated detector

output (RDO)

Analysis object data

(AOD)

Derived AOD (DAOD)

SIMULATION

DIGITIZATION

RECONSTRUCTION

DERIVATION

11 of 24

Composability Scenario 1: ATLAS Open Data

Step 1: get the data

  • Rucio provides access to the whole data catalog
  • Users are already authenticated in Rucio when accessing the VRE
  • We can mount the file in the notebooks thanks to the Rucio Jupyter extension
  • If the files are not locally available, we can trigger a replication rule and bring data to the VRE.

12 of 24

Composability Scenario 1: ATLAS Open Data

Step 2: build the analysis pipeline

  • REANA provides both workflow orchestration and reproducibility capabilities
  • Shameless ad: Bridging authentication for services in the VRE
  • We can build a REANA workflow via Snakemake
  • We can then submit the workflow from the VRE, and run on the REANA Cluster

REANA config file

Snakefile

Inputs configuration

13 of 24

Composability Scenario 1: ATLAS Open Data

Step 3: retrieve preserve, publish the results

  • REANA returns the result in the VRE
  • With Rucio, researchers can re-inject their results in the Data Lake or mark them as Open Data �(See corresponding OSCARS-funded project)
  • Datasets, papers, and more can be published on Zenodo directly from within the VRE.

~ rucio opendata did -h

Usage: rucio opendata did [OPTIONS] COMMAND [ARGS]...

Manage Opendata DIDs

Options:

-h, --help Show this message and exit.

Commands:

add Adds an existing DID to the Opendata catalog

list List Opendata DIDs, optionally filtered by state and...

remove Removes an existing Opendata DID from the Opendata catalog

show Get information about an Opendata DID, optionally including...

update Update an existing Opendata DID in the Opendata catalog.

14 of 24

Composability Scenario 2: Cherenkov Telescope Array Observatory

Presented September 30, 2024, in Hamburg

by Marion PIERRE and Frederic GILLARDO

During OSCARS Consolidation and Terminology Workshop

Additional slides available at the end of the presentation.

Large Size Telescope, La Palma Canary Islands, Spain

15 of 24

Composability Scenario 3: Gravitational Waves

  • The Einstein Telescope is a planned 3rd generation gravitational wave (GW) detector, part of the ESFRI 2021 Roadmap.
  • Detectable GWs are produced during violent astrophysical processes in the Universe, such as black hole collisions and core-collapse supernovae.
  • They provide complementary information to E.M. telescope observations.
  • GWs modify the spacetime metric locally, and are detected through interferometric measurements as variations of the optical path length.

Time [s]

strain amplitude

Image credit: LIGO/T.Pyle

16 of 24

Composability Scenario 3: Gravitational Waves

The Wavelet Detection Filter is a pipeline for burst signal detection written in python (Cuoco et al. 2018, Cuoco et al. 2001)

DOWNSAMPLING

Time-domain filtering

Wavelet transform

GPS, duration, frequency, signal-to-noise ratio

WHITENING

EVENT TRIGGER GENERATION

PARAMETER ESTIMATION

17 of 24

Composability Scenario 3: Gravitational Waves

Scenario: run an explorative analysis on a jupyter notebook on the ESCAPE VRE developed at CERN

AAI to access jupyterhub

Rucio to find and attach data to notebook

Run your custom analysis on the loaded WDF environment

18 of 24

Composability Scenario 3: Gravitational Waves

The notebook runs a multiprocess WDF and produces outputs in the desired folder: whitening coefficients, run parameters and event triggers.

19 of 24

Composability Scenario 3: Gravitational Waves

An example of event triggers generated by WDF.

(TIP: triggers can be used to run further analysis and exclude noise transients. Reusability is key for GW science! Could publish on Zenodo.)

20 of 24

Composability Scenario 2: Simple demo of LAPP VRE

Link to the Zenodo publication for the notebook: https://zenodo.org/records/16881823

21 of 24

22 of 24

Composability Scenario 2: Cherenkov Telescope Array Observatory

Cherenkov Telescope Array Observatory:

  • Three differents type of Telescope :
    • SST : Small-Sized Telescope, detecting the highest energy gamma rays.
    • MST: Medium-Sized Telescope, covering the middle of the CTAO’s energy range
    • LST : Large-Sized Telescope,,� detecting lower-energy
  • 4 PB quantity/ year
  • Two “ON-SITE” :
    • North Site at La Palma (Canary �Islands, Spain)
    • South Site at Paranal (ESO, Chile)

  • Four “OFF-SITE” :
    • CSCS, Lugano in Switzerland
    • PIC, Barcelona in Spain
    • DESY, Zeuthen in Germany
    • INAF/INFN, Frascati in Italy

23 of 24

Composability Scenario 2: Cherenkov Telescope Array Observatory

<<<<<<<<<<<<<<<<<<<<<<<

Rucio in CTAO : Data Orchestration� 1. Receive alert , � 2. Make copy on “OFF-SITE”, � 3. Make a second copy on a second “OFF-SITE”,� 4. Validates the presence of two copies and deletes the �« ON-SITE » base files.

Metadata with Rucio :�Rucio was initially designed for ATLAS to support a list of fixed metadata for all DIDs it supports, a column by metadata.

Rucio DID Meta Plugin feature is part of Rucio’s design to accommodate the needs of different scientific communities by allowing experiments to store custom metadata in a flexible way. �Plugins : Did column meta, Elasticsearch meta, Json meta, Mongo meta, Postgres meta (used by CTAO).

*ACADA will be the central software in charge of operating the CTAO’s two arrays of telescopes in La Palma and in Chile.

24 of 24

Composability Scenario 2: Cherenkov Telescope Array Observatory

IAM is an AAI solution : The INDIGO Identity and Access Management (IAM)Service provides a layer where identities, enrollment, group membership and other attributes and authorization policies, supporting identity federations and other authentication mechanisms (X.509 certificates and social logins)”

IAM with Rucio :�ESCAPE has already developed a script to synchronize users between Indigo IAM and Rucio.

With CTAO, we are currently adapting the script to synchronize users and assign role-based permissions in Rucio based on their group memberships, such as read-only, read/write, or download access. �This development is still in progress

INDIGO IAM Authentication user