1 of 6

ePIC Streaming Computing Model

ePIC Streaming Computing Model Working Group May 2025 1

Four Tiers:

Echelon 0: ePIC Experiment

Echelon 1: Host Labs

Echelon 2: Global processing and data facilities,

includes HPC and HTC resources

Echelon 3: Home institute computing 

We have been developing the

DAQ to Computing

&

Echelon 0 to Echelon 1

architectures and options

These slides are a May 2025 update of an Oct 2024 original

2 of 6

ePIC readout chain - DAQ meets Computing within Echelon 0

ePIC Streaming Computing Model Working Group May 2025 2

DAQ & computing working together on

  • Buffering before data leaves E0 for E1s
  • Monitoring systems
  • Monitoring the experiment, slow controls, DAQ & detector configurations
  • DAQ-side calibration functions
  • Collider-experiment feedback
  • (Minimal) software filtering, e.g. suppressing empty channels and redundant headers

3 of 6

Echelon 0 computing

ePIC Streaming Computing Model Working Group May 2025 3

  • Option 1: fully in DAQ room at IP6 - disfavored because of the Con
    • Con: Requires infrastructure for all E0 processing at IP6. Doesn’t leverage SDCC.
    • Pro: Egress stream ready to fork to the two E1s: E1 parity is manifest
    • Pro: Fully under ePIC DAQ control
  • Option 2: split with a DAQ enclave in the BNL data center - favored as of May 2025
    • Leverage data center resources: infrastructure, space, flexibility
    • Provide a partitioned-off ‘DAQ enclave’ effectively owned and controlled by DAQ
      • Same access controls, security, sysadmin access as DAQ room
      • IP6 - SDCC fiber, switches also ‘DAQ owned’, full control, not sharing traffic, minimal latency
      • Requires 24x7 SLA, urgent 24x7 physical access for DAQ experts
    • Egress streams from enclave symmetrically fork to the two E1 sites

Cost analysis of both options required

NB online monitor == DAQ monitor

4 of 6

Echelon 0 - Echelon 1 data flow and processing

  • DAQ’s domain extends across the IP6-SDCC fiber to the enclave
    • In the enclave is the online farm where DAQ assembles detector data into the event data stream
  • The event data stream is in the form of time frames (TFs) aggregated in contiguous blocks of ~1000 TFs == a super time frame (STF)
    • A TF contains all the detector data in a time window of ~half a millisecond
    • An STF is the atomic data unit for post-DAQ raw data processing
  • DAQ inserts file and run markers into the stream
    • STFs map to ~2GB files, the full dataset delivered identically to the two E1s
  • Echelon 1 sites are symmetric peers
    • E1 principal responsibilities: archiving the stream, prompt processing, monitoring
    • Ensures two geographically separated complete raw data copies
    • Will be up to the ePIC collaboration together with sites to determine the E1 roles in detail
  • Buffers in the DAQ and Echelon 1 sites ensure latency tolerance to avoid deadtime, smooth streaming operation and robustness against data flow interruptions
    • DAQ enclave STF buffer
      • holds STFs built in the enclave until validated as received at an E1
      • sized to hold 72 hours of data (the project-agreed number in the DAQ requirements)
    • E1 buffers
      • sized to hold 3 weeks of data, such that data is disk resident through the ~2 week calibration process

ePIC Streaming Computing Model Working Group May 2025 4

5 of 6

Echelon 0 - Echelon 1 data flow and processing

ePIC Streaming Computing Model Working Group May 2025 5

Switch

Switch

4Tbps

400Gbps via ESnet

Buffer

Buffer

Archive

Archive

Prompt monitoring

BNL data center

JLab data center

ePIC Echelon 1 at BNL

ePIC Echelon 1 at JLab

IP6

DAQ room

DAQ enclave

Echelon 0

STF buffer (72hr depth)

Ext subnet for E1 delivery

Fast monitoring

STF stream

(Rucio)

TF stream

(Messaging?)

Fast monitoring

  • Super timeframe (STF) stream - complete raw data
    • Bulk data in STF format built in DAQ enclave
    • Managed by Rucio, registered at STF buffer via the external subnet and sent to the E1 buffers
    • Serving full data sample consumers: archiving, prompt processing and monitoring
  • Timeframe (TF) stream - fast subsample
    • Subset sent quickly with finer granularity to E1s for fast monitoring; data availability within a few seconds
    • Could be constructed in DAQ enclave in parallel to the STFs, or skimmed from the STFs in the STF buffer

Prompt processing

Prompt monitoring

Prompt processing

6 of 6

Getting to the specifics of E0-E1 dataflow and workflow: Testbeds

  • We have our streaming computing model document V2 (Oct 2024), and an evolving conception of E0-E1 dataflow and workflows, developed in an active streaming computing model meeting series and expressed in the schematic
  • Emphasis now is moving from reports and schematics to the specifics
    • Prototyping ideas and tools in testbeds, guided by requirements
  • Requirements document is gathering input, an instructive guide
  • Testbed work is beginning: infrastructure installed and ready, people identified, growing list of questions to address
    • Developing E0-E1 streaming workflows in a testbed utilizing Rucio and PanDA
      • R&D instances of Rucio and PanDA are operating at BNL for this
      • effort in place and ramping, initial testbed plan established, work and weekly meetings starting (mid-May)
    • Calibration workflows are to be covered as well in the testbeds
      • Describing and executing complex calibration workflows with their dependencies
      • Supporting a detector/data state machine describing dynamic, concurrent activity across the ePIC detector and the machine
      • Calibration information will be gathered across the DSCs, will help establish test workflows
    • Streaming reconstruction
      • Raw data stream to collision event identification to reconstruction and analysis
    • Data handling, storage and archiving
      • Object stores? XRootD roles? Handling fine-grained low-latency data distinct from bulk stream?

ePIC Streaming Computing Model Working Group May 2025 6