1 of 19

SKA Regional Centres and the SKAO Data Landscape

Rosie Bolton - Head of Data Operations

BNL Round Table March 2022

2 of 19

SKA Regional Centres

  • SKA Data Reduction
  • Role of SRCs
  • SKAO Rucio work intro

3 of 19

SKA Regional Centres: SKAO data processing stages

SKA LOW

SKA MID

2 Pb/s

8.9 Tb/s

7.8 Tb/s

8.9 Tb/s

100 Gb/s

SKA Regional Centres

100 Gb/s

20 Tb/s

Data Products

Correlated / conditioned signals

Beamformed data streams (focused on sky patch)

Large-area response data streams

Generality

Specificity

4 of 19

How do users "control" data products?

  • Need to satisfy users whilst retaining control of SKAO resources
  • Science Data Processor is fundamentally part of each SKA Telescope
  • Users will specify required data products in Observing Proposals
  • All user interaction with data products will be in SRCs

Generality

Specificity

5 of 19

The Role of SRCs: Collaboration platform

SRCs will bridge the gap between the highly data intensive pre-defined workflows generating SKA data products in the SDP, and the iterative flexible, user-led data analysis required to produce scientific results

SRCs will provide collaborative tools backed up by powerful compute and data management

Credit: Heywood et al.; Sophia Dagnello, NRAO/AUI/NSF; STScI.

Image cut-outs

Plots for publication

Paul, Sourabh et al. (2016). ApJ 833. 10.3847/1538-4357/833/2/213.

Power spectra

Catalogues / Source List

Workflows notebooks

Users will not have access to the SDP or to Raw SKA data!

👀

6 of 19

The Role of SRCs: Support data product (re-)use

  • All SKA Data Products will (in time) become public - this is likely to be the biggest science generator (see Hubble)
    • Build SKA science archive around IVOA standards
    • Ensure interoperability with other archives and other experiments

Why

7 of 19

SKA Regional Centre Capabilities

Interoperability

Heterogeneous SKA data from different SRCs and other observatories

Support to Science Community

Support community on SKA data use, SRC services use, Training, Project Impact Dissemination

Visualization

Advanced visualizers for SKA data and data from other observatories

Science Enabling Applications

Analysis Tools, Notebooks,

Workflows execution

Machine Learning, etc

Distributed Data Processing

Computing capabilities provided by the SRCNet to allow data processing

Data Discovery

Discovery of SKA data from the SRCNet, local or remote, transparently to the user

Data Management

Dissemination of Data to SRCs and Distributed Data Storage

8 of 19

SKA Regional Centers: Data management

Storing SKAO data growing at up to 700 PBytes each year will be a challenge (plus user-generated data too).

Several million dollars per year in new data, for one copy

Global data management within SRCNet should enable best possible use to be made of available storage resources

Avoid (reduce) unnecessary duplication

Support mirroring of popular data products to enhance user experience

DATA STORAGE

9 of 19

ESCAPE Data Management

ESCAPE WP2 collaboration - CERN as lead, but developing real interest from several Astro-Particle / HEP Experiments

CTAO, KM3NET, LOFAR, SKAO, FAIR

10 of 19

ESCAPE DATA LAKE DEPLOYMENT

  • Largely built on WLCG sites
  • Considerable expertise in the deployment stack
  • Newcomers given access to this "playground" to test their non-WLGC Data Management use cases
  • Spawned interest from several communities to deploy their own Rucio instances
  • SKAO
  • CTAO / MAGIC
  • KM3NET
  • + LSST/Rubin assessment (both in ESCAPE and at Rubin)

11 of 19

Astronomy Data Management flows

Image from MAGIC telescope, but applicable to many astro use cases

"Remote" might be up a mountain or, for SKA just be far from data analysis facilities

remote telescope data generation

distributed data access / analysis centres - for SKA: SKA Regional Centres

clear space at telescope site

12 of 19

SKA Rucio testbed - our own sandpit to play in

  • K8s deployment (STFC Cloud)
  • x509 Auth currently (exploring token use with ESCAPE INDIGO IAM)
  • Judicious re-use of existing stack from ESCAPE (eg FTS)

  • Well suited to centralised Operations model for data management
  • Already performing long-haul transfers (via our automated test framework)
  • Aim to integrate storage from national SRC efforts to increase understanding and inform assessment

13 of 19

SKAO team interest - Exploring technologies with an eye on ease of operation

Software-defined infrastructures

Reproducible platform packages (copy/paste)

Rucio (central brain data management)

Storage Inventory (decentralised data management, site subscription model)

Metadata*: enhance findability and interoperability of astronomy data products

*Ranged metadata functionality in Rucio; metadata special interest group

14 of 19

End

Slide /

15 of 19

First Prototyping Phase: 2022-2023

Work now happening to identify development teams to prototype key technologies that will enable selection as SRC functionality and scale grows.

Data Management service:

Replication, distribution, synchronisation of data products and location index

Federated Authentication and Authorization: identity management, compatible with SKAO

Data Analysis: Science Extraction, Processing in Notebooks

Data Visualisation and discovery - performance at SKA scale

Central Services and Software Distribution: SW infrastructure, compute provision

16 of 19

The Role of SRCs: Batch processing

  • SKAO and User-led batch pipeline processing of data products - e.g. combining images into deep or wide sky map; source feature extraction etc
  • Details of how batch jobs will run still TBD
    • SKAO pipelines will be well defined, with dataproducts pre-specified
    • User-submission of jobs to batch systems has not been prioritised in development work so far

17 of 19

SRC Network global capabilities

Collectively meet the needs of the global community of SKA users

Anticipate heterogeneous SRCs, with different strengths

18 of 19

Pledging

Each SRC to pledge resources into global pool to support SRCNet activities

Users can access resources across SRCNet according to their research needs and permissions

Hope is that each SRC will be able to contribute a total effort that is proportional to their SKA fraction

Additional resources at an SRC could be given to the pool or prioritised to support national interests

How

19 of 19

Operations

Personnel within each SRC project will be identified to be part of the SRC Operations Group (SOG) - meeting regularly to discuss issues, share tasks, see and test global system health

SOG will be led from SKAO Ops, with a team from across each SRC project and SKAO.

(an example dashboard from our data management prototype, details not important, but nice to see that we are using UK grid storage endpoints in our Rucio prototype which is itself run off IRIS resources at STFC cloud)

How