1 of 9

Open Data Facility

Design Preview

v0.2 — for the IRIS-HEP SSL team

May 7, 2026

Aidan • Fengping • David • Farnaz • Judith

Rob Gardner · UChicago / EFI · May 7, 2026 with AI assistance from Claude

2 of 9

THE FRAME

Why ODF, why RP1, why now

Why ODF

  • Workshop themes converging: mirror-not-copy, three communities, AI-ready formats
  • CERN Open Data portal (5 PB) is rate-limited, can't carry AI workloads
  • HEP-ML developers + classrooms have no first-class home today

Why RP1

  • RP1 is already an R&D / integration platform — the ODF materializes it
  • Exercises every RP1 capability: federation, scoped IAM, BinderHub, IaaS GPU
  • Gives RP1 its first external constituency and its first AF-comparison surface

Why now

  • Workshop momentum: white paper drafting, US-side mirror conversation
  • AmSC / IRI / NRP plumbing under-built — RP1 is positioned to demonstrate
  • Agent stack maturing (IRIS-HEP); REANA already deployed and well-liked on AF

Open Data Facility on RP1 — Design Preview v0.2

2 / 9

3 of 9

DESIGN

Four conceptual principles

Multiple interfaces, one substrate

BinderHub, JupyterHub, REANA — and a forward path to IRI-mediated and agentic-ready surfaces. Same backend, same data, same IAM.

Mirror, not copy

A US-side mirror of the CERN-curated open data with provenance preserved back to Zenodo / experiment DOI. We serve, we don't republish.

Capability without accounts

Three IAM-scoped tiers — Public, Vetted, Anointed — built on institutional SSO. No AF Unix accounts required.

Do good first, do better next week

Phased delivery driven by realized use rather than projected scale. Hardware in the margins; the real ask is engineering effort.

Open Data Facility on RP1 — Design Preview v0.2

3 / 9

4 of 9

REQUIREMENTS

Six use cases driving the design

From the November 2025 ATLAS Open Data tutorial. Every architectural choice serves these.

6.1

NTuple education / outreach

Stack A • BinderHub → JupyterHub

6.2

PHYSLITE columnar at scale

Stack B + A • JupyterHub + REANA

6.3

Statistical inference (pyhf)

Stack A • REANA (GPU-optional)

6.4

ML on HEP data (BDT / DNN / GNN)

Stack A • JupyterHub + REANA • GPU

6.5

MC event generation

Stack B • REANA + Apptainer

6.6

Systematics / PHYSLITEtoOpenData

Stack B • REANA + AnalysisBase

4 of 6 use cases are most natural in REANA — that's why REANA is a peer interface, not a luxury add-on.

Open Data Facility on RP1 — Design Preview v0.2

4 / 9

5 of 9

USERS

Three communities, three tiers

Out-of-band researcher

WORKLOAD

NTuple H→γγ exploration, individual ML experiments

TIER

Public → Vetted

INTERFACES

BinderHub → JupyterHub; REANA for re-runs

COMPUTE

2–8 cores · 0–1 GPU on request

STORAGE

0 / 50 GB home

Workshop/Classroom (~30–40)

WORKLOAD

Tutorial notebooks, paced exercises

TIER

Vetted (service-account)

INTERFACES

BinderHub (ephemeral) · JupyterHub (multi-day)

COMPUTE

2 cores/user · ≤1 shared GPU/session

STORAGE

5–10 GB scratch / student

HEP-ML developer

WORKLOAD

FM pre-training, BDT/GNN training, columnar pipelines

TIER

Vetted → Anointed

INTERFACES

JupyterHub (dev) + REANA + agentic surfaces

COMPUTE

Burst Dask cluster · 1–8 GPUs

STORAGE

50 GB home + ≥10 TB project

Open Data Facility on RP1 — Design Preview v0.2

5 / 9

6 of 9

CAPABILITIES

Four interfaces over one substrate

BinderHub

Ephemeral · public · lowest friction

Stateless containers built on demand.

For users who click 'launch' on a tutorial link.

Public • Vetted

JupyterHub

Interactive · persistent · AF-like

Tier-aware spawner profiles, persistent home,

project scratch on EOS. The dev surface.

Vetted • Anointed

REANA

Declarative · reproducible · async

Pinned-container workflows in YAML.

Right surface for production analyses.

Vetted • Anointed

Future: Intelligent (IRI / AmSC) + Agentic (MCP · CLI · SKILL)

Programmatic API for AmSC clients to drive ODF as an IRI service · opendata-mcp + reana-mcp + SKILL marketplace at rp1.hl-lhc.io/skills

Not every combination composes — see §4.8 of the design doc for the tensions table (containers vs interactive, agentic vs Public-tier security, sync vs async UX).

Open Data Facility on RP1 — Design Preview v0.2

6 / 9

7 of 9

SHAPE

Architecture at a glance

INTERFACES

BinderHub

JupyterHub

REANA

(future) Agentic / IRI

OWNER

Aidan & Fengping

COMPUTE & SCHEDULING

RP1 K8s

Kueue

Dask-Gateway

HTCondor

ServiceX/Y

OWNER

Aidan & Fengping

DATA PLANE

MWT2_OPENDATA (dCache / Rucio)

atlasopenmagic (metadata mirror)

User outputs (EOS potentially)

OWNER

Judith

EQUIPMENT & NET

ODF compute pool

GPU pool (A100)

≥25 Gbps fabric

CVMFS

OWNER

David & Farnaz

FEDERATION UC primary ↔ IU stretched-K8s ↔ Tempest / Pile ↔ NRP / AmSC burst

Open Data Facility on RP1 — Design Preview v0.2

7 / 9

8 of 9

ROADMAP

Phased plan

Phase 0

Foundation · 0–2 mo

KEY WORK PACKAGES

  • WP-1 storage
  • WP-4 equipment
  • WP-6 IAM + router
  • WP-12 Stack A
  • WP-16 REANA on RP1

EXIT CRITERION

Internal user runs H→γγ end-to-end against the local mirror — as a notebook AND as a REANA workflow

Phase 1

Soft launch · 2–6 mo

KEY WORK PACKAGES

  • WP-2 metadata
  • WP-3 EOS quotas
  • WP-7 public hardening
  • WP-8 Dask
  • WP-9 HTCondor
  • WP-12 Stack B
  • WP-13 tutorials
  • WP-17 REANA library

EXIT CRITERION

Classroom of 30+ external students completes a tutorial run, hands-off; one HEP-ML dev runs a non-trivial REANA workflow

Phase 2

Scale + open APIs · 6–12 mo

KEY WORK PACKAGES

  • WP-5 GPU
  • WP-10 ServiceX/Y
  • WP-11 federated burst
  • WP-13 Integration Challenge
  • WP-14 DOI
  • WP-18 IRI API
  • WP-19 agentic

EXIT CRITERION

AmSC client drives ODF programmatically; opendata-mcp + reana-mcp live; SKILL marketplace open

Phase 3

Graduate · 12+ mo

KEY WORK PACKAGES

  • WP-15 AF graduation
  • Peer project engagement
  • Multi-experiment corpus
  • Marketplace community contributions

EXIT CRITERION

First validated patterns graduate from RP1 to AF; ODF as one node in a federated US mirror

Open Data Facility on RP1 — Design Preview v0.2

8 / 9

9 of 9

DISCUSSION

Open questions for the team

Sampling of questions we can answer in v0.3.

1

Storage corpus posture

Open Data licensing for Rucio-mirrored hosting — confirm with ATLAS Open Data team. Initial dataset selection and replication-rule shape.

Owner: Judith

2

REANA deployment topology

Single instance shared with AF, or parallel on RP1 with shared storage and IAM? Workflow languages — Serial + Snakemake day-one, defer CWL/Yadage?

Owner: Aidan & Fengping (with AF REANA op)

3

Equipment & networking

ODF compute pool sizing, GPU class, ≥25 Gbps fabric to MWT2_OPENDATA dCache, CVMFS reachability for new node classes.

Owner: David & Farnaz

4

Public-tier security model

Threat-model and hardening plan with UChicago security before public exposure. Gates §4.7 agentic exposure too.

Owner: Aidan & Fengping & Judith

5

GPU contention with internal AF users

Scheduling policy for ODF GPU jobs vs internal ATLAS workloads. REANA-dispatched GPU jobs included in the same policy.

Owner: Aidan & Fengping (with Giordon & Rob)

6

Agentic-tool security posture

Even at Vetted tier, MCP tools wrapping shell execution are risky. Allowlist policy + audit logging — review before WP-19 lands.

Owner: Aidan & Fengping (with Giordon, Ilija and Rob)

Open Data Facility on RP1 — Design Preview v0.2

9 / 9