Earth System Grid Federation Future Architecture, Copernicus, Cloud and ESA
ClimateData.ca Meeting, 23 November 2021
Philip Kershaw, Technical Manager
Centre for Environmental Data Analysis
Earth System Grid Federation: a globally distributed data archive for climate data
ESGF Dashboard: http://esgf-ui.cmcc.it
ESGF – Application and Evolution
ESGF Frontend
ESA Climate Change Initiative Open Data Portal
CMIP/CORDEX for Copernicus Climate Data Store
CMIP5 >> Earth System Grid Federation >> CMIP6
Public Cloud Public Dataset Programme(s)
ESGF Future Architecture >> ESGF 2.0
Institutional-based hosting
Data lakes – Government-sponsored
ESA Climate Change Initiative Open Data Portal
2 Phases:
C3S 34[a-f] Projects for the CDS
Node
Node
Node
Node
Node
Node
ESGF: an international federation of nodes providing a network of access points to model data
Single point of Access
[DNS Load Balancing]
Node
Node
Node
C3S 34a/b system: a single resilient point of access to data delivered through replication and redundancy
C3S Resilient CMIP and CORDEX Data Access
Single point of Access
[DNS-based load balancing]
Data Node
[DKRZ]
Master Data Node
[CEDA]
Data Node
[IPSL]
CDS
Replicate netCDF model data
Replicate netCDF model data
download data
Access control complex to maintain
OPeNDAP for data sub-setting is inefficient
Sub-setting Services for C3S 34e Project
Analyse datasets and make an inventory of fixes
Applies fixes to datasets before applying subsetting/regridding operations
C3S 34e Project
Credit: Ag Stephens, CEDA
ESA Earth Observation Exploitation Platform Common Architecture (EOEPCA)
ESGF – Application and Evolution
ESGF Frontend
ESA Climate Change Initiative Open Data Portal
CMIP/CORDEX for Copernicus Climate Data Store
CMIP5 >> Earth System Grid Federation >> CMIP6
Public Cloud Public Dataset Programme(s)
ESGF Future Architecture >> ESGF 2.0
Institutional-based hosting
Data lakes – Government-sponsored
ESGF Future Architecture
Platforms and systems administration
Modular, scalable architecture: Containers, Kubernetes
Embrace infrastructure-as-code approach
Search services
Modernise, centralise and simplify
Use community standards: STAC
ID Management and Access Entitlement
Modernise, centralise and simplify
Use industry standards: OpenID Connect / OAuth 2.0
Progress and Achievements
ESGF Future Architecture
New modes for Data Access + Storage
Augment trad. file serving with object store
New models for aggregation and subsetting, retire OPeNDAP
Compute Services
Important but no consensus for ESGF-wide standard offering yet
Metrics Collection
Leverage advances in industry with standard tooling to exploit - Prometheus and InfluxDb, Grafana
Progress and Achievements
Future Architecture Node – Phase 1
Kubernetes Cluster
Horizontal Pod Auto-scaler
Auto-Scaling
Elastic
POSIX Storage
Access Control
Identity Provider
TDS
Nginx File Serving
ESG Search
V5 esg-publisher
Ingress
Solr
Client App – Search + Data Access
Metrics
✂️
Future Architecture Node – Phase 2
Site Deployment(s)
Centralised Deployment(s)
Kubernetes Cluster
Kubernetes Cluster
Horizontal Pod Auto-scaler
Auto-Scaling
Elastic
POSIX Storage
Access Control
Identity Provider
TDS
Nginx File Serving
STAC Search
Elastic Search
Client App - Data Access
Ingress
Metrics
Ingress
Client App – Data Search
Client App – Publishing
STAC API for ESGF
IS-ENES3 - Data Analytics using Notebooks/icclim
DestinE and Blueprint Architecture
Destination Earth (DestinE) - major EU initiative:
��
JASMIN
Cloud Infrastructure
Data Sources
Data Analytics Platform
High Performance Computing
Data production / processing
ESA Digital Twin Earth (DestinE) Precursor - land surface modelling and climate
19
What could be the future impact of climate change on the soil moisture?
20
Make a surrogate AI model to JULES
�time series of daily weather data 🡺 time series of soil moisture data
21
Digital Twin Precursor on JASMIN: HPC for data production, cloud for analysis
External JASMIN Infrastructure
external cloud tenancy
Cluster-as-a-Service
Managed (Internal) JASMIN
Group Workspace (GWS)
SOF
[POSIX]/
JULES / LAVENDAR Data Assimilation
Batch compute (Lotus)
netCDF
Soil Moisture model outputs netCDF files to regular file system
Cluster-as-a-Service deploys ready-made Jupyter service
Move data into object store so that it can be accessed by Jupyter Service on JASMIN cloud
Data accessed using Jupyter Notebook service
Arrangement of data and efficient access
Time axis
Latitude
Longitude
Object Store: Different storage strategies showed radically different performance
Using Object Store for re-arrangement of data to suite our access patterns
Rechunking of data made possible interactive maps with long time series
Take home message: object store for analysis-ready cache specific to project needs
Futures
Public Cloud Public Dataset Programme(s)
ESGF Future Architecture >> ESGF 2.0
Institutional-based hosting
Data lakes – Government-sponsored
Acknowledgements + Further Info
@ISENES_RI
@cedanews
@PhilipJKershaw
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement N°824084
IS-ENES3 website
Contact us at
Subscribe to the IS-ENES3 H2020 Youtube channel !
ESGF Future Architecture Report: https://doi.org/10.5281/zenodo.3928222