1 of 14

1

EIC Computing Meeting

Oct. 18 2021

D. Lawrence

Computing Plan Status

2 of 14

Data Storage and Compute

2

ECCE Bi-Weekly meeting - Computing Team Report - Sep. 27, 2021

Detector

FEB (digitization)

Online Buffer

EBDC (few days)

Online Event Filter

CPU, FPGA, GPU

Offline Buffer

(few weeks)

Calibration

raw

storage

Reconstruction

recon

storage

Experimental Hall and Counting House (Project Funds)

Data Center(s): SDCC [,JLab, …]

(Operations Funds)

HTC Compute Facilities

SDCC ,JLab, …

(Operations)

FELIX-like

FEP (initial filter)

O(100Tbps)

O(10Tbps)

O(0.1Tbps)

O(0.1Tbps)

O(10GB/s)

O(1GB/s)

O(0.1Tbps)

O(0.01Tbps)

reduce ~x2

mitigate risk of network failure

3 of 14

Data Storage and Compute

3

ECCE Bi-Weekly meeting - Computing Team Report - Sep. 27, 2021

Detector

FEB (digitization)

Online Buffer

EBDC (few days)

Online Event Filter

CPU, FPGA, GPU

Offline Buffer

(few weeks)

Calibration

raw

storage

Reconstruction

recon

storage

Experimental Hall and Counting House (Project Funds)

Data Center(s): SDCC [,JLab, …]

(Operations Funds)

HTC Compute Facilities

SDCC ,JLab, …

(Operations)

FELIX-like

FEP (initial filter)

O(100Tbps)

O(10Tbps)

O(1Tbps)

O(0.1Tbps)

O(10GB/s)

O(1GB/s)

O(0.1Tbps)

O(0.01Tbps)

Offline Event Filter?

O(1Tbps)

reduce ~x2

mitigate risk of network failure

4 of 14

EIC Streaming Readout (From Fernando Barbosa’s talk at AI4EIC Sep. 9, 2021)

factor of 100 in data reduction

5 of 14

from EIC Yellow Report

From YR 2.10

“Most of the key physics topics discussed in the EIC White Paper [2] are

achievable with an integrated luminosity of 10 fb−1 corresponding to 30 weeks of

operations.”

10x1033 cm-2s-1 x 30 weeks

=

18.1 fb-1

(assumes 100% beam up time. YR also says 60% operational efficiency which would bring this to 10.3 fb-1)

bottom line: 10fb-1 represents roughly first year of running with about half of that for ep scattering

6 of 14

For ep scattering representing half of a 30 week year of running we expect:

50μb x 5fb-1 = 250B collisions

Inclusive ep scattering MC DST: ~600MB/2k events = 300kB/event

250B/(15weeks*7days/wk*24hr/d*3600s/hr)/(60% accel. efficiency) =46kHz

46kHz * 300kB = 14GB/s (140Gbps)

from EIC Yellow Report

close to Jin’s simulation calculation of O(100Gbps)

7 of 14

Multiple levels of event filter

Input/Output

Reduction Factor

Technology

FELIX-like Compute Interface

100Tbps/10Tbps

10

FPGA

Online Event Filter

10Tbps/1Tbps

10

FPGA, (GPU), CPU

Offline Event Filter

<1Tbps/01.Tbps

~10

FPGA, GPU, CPU

Reconstruction

1Tbps/0.1Tbps

10

(FPGA), GPU,CPU

8 of 14

Federated Computing Model

Offline Buffer

Calibration

raw

storage

Reconstruction

recon

storage

Site 1: (e.g. SDCC)

Offline Buffer

Calibration

raw

storage

Reconstruction

recon

storage

Site 2: (e.g. JLab)

Offline Buffer

Calibration

raw

storage

Reconstruction

recon

storage

Site 3: (e.g. NERSC)

Detector

FEB (digitization)

O(100Tbps)

O(10Tbps)

O(1Tbps)

FELIX-like

FEP (initial filter)

Online Event Filter

CPU, FPGA, GPU

Online Buffer

EBDC (few days)

Offline Event Filter

O(0.1Tbps)

reduce ~x2

mitigate risk of network failure

9 of 14

Federated Computing Model

Benefits of using a federated model in the near-time pipeline

  • Each site only needs to handle a fraction of data�
  • EIC computing becomes a smaller fraction of each compute farm�
  • One site having diminished capacity temporarily can easily be absorbed by others without reconfiguration

10 of 14

Worldwide LHC Computing Grid (WLCG)

  • Tier 0 - LHC
    • Store all raw data
    • 20% of LHC computing
    • Initial reconstruction
    • Distribute raw data and initial recon to all Tier 1 sites
  • Tier 1 - multiple sites(13)
    • Combined capacity to store all data from Tier 0
    • Compute for large scale reprocessing
    • Distribute data to Tier 2 sites
    • Store simulated data from Tier 2 sites
  • Tier 2 - Universities and Scientific sites (155)
    • Compute for specific analysis tasks
    • Some reconstruction
    • Simulation
  • Tier 3 - End users
    • small clusters or personal computers

11 of 14

An EIC Butterfly Model

EIC

JLAB

SDCC

BATES

NERSC

OSG

University

University

University

University

University

University

University

University

Tier 0

Tier 1

Tier 2

Tier 3

Tier 3

Nearly all storage (raw data, reconstructed data, simulated data) is stored across Tier 1 sites

12 of 14

ECCE Computing Plan

12

ECCE Bi-Weekly meeting - Computing Team Report - Sep. 13, 2021

  • Internal deadline of Oct. 23
    • rough draft, all sections�
  • Internal review Oct. 24-30�
  • Designated Reader review Nov. 1-15�
  • Final draft Nov. 30

Key features:

  • Streaming Readout
  • Reconstruction in near-real time (~2weeks)
  • Federated Computing

13 of 14

Open Questions

  • Computing/Storage�
    • What are DOE rules, where are they written, can they be updated?�
    • LHC-style pyramid model vs. butterfly model?�
    • Common system for all EIC experiments?�
  • Data Processing/Software�
    • Limitations on calibration latency? (e.g. detector design)�
    • Nature and number of filters. Types of technology that can be used (e.g. FPGA, GPU, …)

14 of 14

Backups