As high performance computing have evolved to exascale and beyond, bandwidth to the filesystem has not kept up with increases in compute speed and memory capacity. The Sustainable Staging Transport (SST) breaches this “IO Wall” by streaming data directly between stages of a computational workflow.
Significance and Impact
SST allows easy transition of existing science applications by using existing ADIOS I/O APIs. This means that many science apps can see their overall workflow times decreased, potentially increasing the speed of scientific discovery across the range of fields using HPC resources.
A depiction of an HPC computation using SST to stream data directly to an analysis program instead of using the filesystem as an intermediary. On OLCF’s Frontier Exascale machine SST achieves up to 30 TB/s of parallel throughput, exceeding by a factor of three the 10 TB/s bandwidth limit of the Orion filesystem.
The 30TB/s result was achieved with a particle-in-cell (PIC) simulation of the Kelvin-Helmholtz instability (KHI) computed by the PIConGPU code coupled with a�PyTorch-based [16] training code to train a data-driven model for reconstructing the phase space from the particle data and the in-situ computed radiation data of the simulation.
Technical Approach
Use normal ADIOS read/write APIs for consumer/producer
Stream data and avoid the filesystem with few if any code changes
SST provides synchronization and control while utilizing the native HPC network for actual data transfer
Institution Logo #2
PI(s)/Facility Lead(s): G. Eisenhauer; S. Klasky
Collaborating Institutions: GeorgiaTech, ORNL
ASCR Program: RAPIDS
ASCR PM: Kalyan Perumalla, Hal Finkel
Publication(s) for this work: G. Eisenhauer, et al., “Streaming Data in HPC Workflows Using ADIOS,” Cray Users Group Technical Conference (2024). and/or github.com/ornladios/ADIOS2: