High-Performance Computing with Python/RS-DAT
OpenGeoHub Summer School, Wageningen, 03-09-2025
The Netherlands eScience Center
National centre / independent foundation since 2012 /
NWO & SURF
“Empowering researchers across all disciplines through advanced research software”
Meiert
Grootes
Francesco
Nattino
Ou
Ku
Let’s stay�in touch
www.eScienceCenter.nl
info@esciencecenter.nl
+31 20 460 4770
Check for our open calls
EO and RS as a (scientific) resource
EO and RS as a (scientific) resource
All images Credit: ESA/NASA CC-BY
Big data - Challenge of the The V’s: Volume, Velocity, Variety
EO and RS as a (scientific) resource
Big data - Challenge of the The V’s: Volume, Velocity, Variety
Credit: ESA/NASA, AHN, NCG, SkyGeo
From PC to HPC/HTC
Failure of the status quo
Remote data resources
Desktop / Workstation
!
!
!
!
Commercial GUI tools
!
!
What then?
Scalable Compute
Mass storage
Remote data resources
User
Scalable software stack
?
?
?
What then?
Scalable Compute
Mass storage
Remote data resources
User
Scalable software stack
?
?
?
HPC/HTC vs Cloud vs Platform
User
Provider
Responsibility
HPC/HTC vs Cloud vs Platform
|
|
|
|
|
|
HPC/HTC vs Cloud
Pangeo link
Pythia project link (tutorials and educatioal materal
Software stack
Scalable Compute
Mass storage
Remote data resources
User
?
?
The PANGEO community
Pangeo link
Pythia project link (tutorials and educatioal materal
Core:
Interoperable, scalable, Python data science stack with geospatial focus
Scalable computation
Rich data model w/ out-of-core support
Interactive analysis and execution
RS-DAT – facilitating HPC/HTC platforms
Scalable Compute
Mass storage
Remote data resources
User
?
?
HPC/HTC
RS-DAT – facilitating HPC/HTC platforms
Scalable Compute
Mass storage
Remote data resources
User
?
?
HPC/HTC
What is RS-DAT ?
Snellius
SPIDER
SRC
dCache
Remote data resources
User
Community/ Third party software stack
Adoption Barrier
Jupyter at scale
RS-DAT Components
Scalable analysis with Jupyter
Using legacy (Docker) containers for HPC
Python interface to dCache
utility functions to manage STAC catalogs (and the underlying data) on dCache.
Today
Excursion: HPC/HTC basics
deic
Excursion: HPC/HTC basics
scheduling
The batch scheduler
scheduling
The batch scheduler
scheduling
The batch scheduler
Everything scheduled, but inefficient
Long/resource heavy may never be scheduled
scheduling
The batch scheduler
Everything scheduled, but inefficient, long wait
Long/resource heavy may never be scheduled
time
Jupyter Dask on Slurm
SURF H*C Infrastructure: Spider (HTC), Snellius (HPC); SLURM scheduler
Focus for today
Providing you with experience/expertise to leverage SURF H*C Infrastructure: Spider (HTC), Snellius (HPC); SLURM scheduler
Required materials and point of departure for the interactive session is https://edu.nl/4jc6u
scheduling
https://slurm.schedmd.com/quickstart.html
Solution dichotomy
Pangeo link
Pythia project link (tutorials and educatioal materal
RS-DAT core in brief
Flexible library for parallel and distributed computing on "big (larger than memory) data".
than-memory data (chunks)
idiosyncracies)
What next?
Snellius
SPIDER
SRC
dCache
Remote data resources
User
Scalable software stack
?
?
?
What is RS-DAT ?
Snellius
SPIDER
SRC
dCache
Remote data resources
User
Community/ Third party software stack
Adoption Barrier
Adoption Barrier
Democratizing Big GeoData