SKA Regional Centres and the SKAO Data Landscape
Rosie Bolton - Head of Data Operations
BNL Round Table March 2022
SKA Regional Centres
SKA Regional Centres: SKAO data processing stages
SKA LOW
SKA MID
2 Pb/s
8.9 Tb/s
7.8 Tb/s
8.9 Tb/s
100 Gb/s
SKA Regional Centres
100 Gb/s
20 Tb/s
Data Products
Correlated / conditioned signals
Beamformed data streams (focused on sky patch)
Large-area response data streams
Generality
Specificity
How do users "control" data products?
Generality
Specificity
The Role of SRCs: Collaboration platform
SRCs will bridge the gap between the highly data intensive pre-defined workflows generating SKA data products in the SDP, and the iterative flexible, user-led data analysis required to produce scientific results
SRCs will provide collaborative tools backed up by powerful compute and data management
Credit: Heywood et al.; Sophia Dagnello, NRAO/AUI/NSF; STScI.
Image cut-outs
Plots for publication
Paul, Sourabh et al. (2016). ApJ 833. 10.3847/1538-4357/833/2/213.
Power spectra
Catalogues / Source List
Workflows notebooks
Users will not have access to the SDP or to Raw SKA data!
👀
The Role of SRCs: Support data product (re-)use
Why
SKA Regional Centre Capabilities
Interoperability
Heterogeneous SKA data from different SRCs and other observatories
Support to Science Community
Support community on SKA data use, SRC services use, Training, Project Impact Dissemination
Visualization
Advanced visualizers for SKA data and data from other observatories
Science Enabling Applications
Analysis Tools, Notebooks,
Workflows execution
Machine Learning, etc
Distributed Data Processing
Computing capabilities provided by the SRCNet to allow data processing
Data Discovery
Discovery of SKA data from the SRCNet, local or remote, transparently to the user
Data Management
Dissemination of Data to SRCs and Distributed Data Storage
SKA Regional Centers: Data management
Storing SKAO data growing at up to 700 PBytes each year will be a challenge (plus user-generated data too).
Several million dollars per year in new data, for one copy
Global data management within SRCNet should enable best possible use to be made of available storage resources
Avoid (reduce) unnecessary duplication
Support mirroring of popular data products to enhance user experience
DATA STORAGE
ESCAPE Data Management
ESCAPE WP2 collaboration - CERN as lead, but developing real interest from several Astro-Particle / HEP Experiments
CTAO, KM3NET, LOFAR, SKAO, FAIR
ESCAPE DATA LAKE DEPLOYMENT
Astronomy Data Management flows
Image from MAGIC telescope, but applicable to many astro use cases
"Remote" might be up a mountain or, for SKA just be far from data analysis facilities
remote telescope data generation
distributed data access / analysis centres - for SKA: SKA Regional Centres
clear space at telescope site
SKA Rucio testbed - our own sandpit to play in
SKAO team interest - Exploring technologies with an eye on ease of operation
Software-defined infrastructures
Reproducible platform packages (copy/paste)
Rucio (central brain data management)
Storage Inventory (decentralised data management, site subscription model)
Metadata*: enhance findability and interoperability of astronomy data products
*Ranged metadata functionality in Rucio; metadata special interest group
End
Slide /
First Prototyping Phase: 2022-2023
Work now happening to identify development teams to prototype key technologies that will enable selection as SRC functionality and scale grows.
Data Management service:
Replication, distribution, synchronisation of data products and location index
Federated Authentication and Authorization: identity management, compatible with SKAO
Data Analysis: Science Extraction, Processing in Notebooks
Data Visualisation and discovery - performance at SKA scale
Central Services and Software Distribution: SW infrastructure, compute provision
The Role of SRCs: Batch processing
SRC Network global capabilities
Collectively meet the needs of the global community of SKA users
Anticipate heterogeneous SRCs, with different strengths
Pledging
Each SRC to pledge resources into global pool to support SRCNet activities
Users can access resources across SRCNet according to their research needs and permissions
Hope is that each SRC will be able to contribute a total effort that is proportional to their SKA fraction
Additional resources at an SRC could be given to the pool or prioritised to support national interests
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
How
Operations
Personnel within each SRC project will be identified to be part of the SRC Operations Group (SOG) - meeting regularly to discuss issues, share tasks, see and test global system health
SOG will be led from SKAO Ops, with a team from across each SRC project and SKAO.
(an example dashboard from our data management prototype, details not important, but nice to see that we are using UK grid storage endpoints in our Rucio prototype which is itself run off IRIS resources at STFC cloud)
How