sKokkos: Enabling Kokkos with Transparent Device Selection on Heterogeneous Systems using OpenACC�
1
Scientific Achievement
An ORNL research team elevated the programming productivity for Kokkos applications by enabling the Kokkos runtime to decide which device to use on heterogeneous systems in a transparent way thanks to OpenACC capabilities.
Significance and Impact
sKokkos eliminates the burden for architecture or device selection for Kokkos application developers
Provides a real performance-portable and high-programming productivity solution on heterogeneous systems
Achieves high speedups and power efficient hardware exploitation by selecting the proper device
Nominated for Best Paper Award from the ACM HPC Asia’24 conference
Figure 1: LULESH overall performance for NVIDIA H100 Hopper GPU, showing sKokkos speed up of up to 14x.
Y-axis: (left) execution time (s) in logarithmic scale and speedup (right). X-axis: input problem size (3D grid). Speedup is computed as the ratio between the slowest KokkACC time (using either a CPU or a GPU) and sKokkos time.
Technical Approach
Used OpenACC capabilities within Kokkos for multi-device support and runs
Developed an agnostic and high-level autotuning technique to decide which device to use depending on hardware features and application demands
PI(s)/Facility Lead(s): Pedro Valero-Lara, Jeffrey S. Vetter; Scott Klasky
Collaborating Institutions: Oak Ridge National Laboratory
ASCR Program: RAPIDS-2 and ECP
ASCR PM: Kalyan Perumalla
Publication(s) for this work: Pedro Valero-Lara, et al., sKokkos: Enabling Kokkos with Transparent Device Selection on Heterogeneous Systems using OpenACC. HPC Asia 2024: 23-34.
doi:10.1145/3635035.3635043
and/or Code Developed or Datasets: https://code.ornl.gov/5pv/skokkos
Pedro Valero-Lara, Seyong Lee, Joel E. Denny, Keita Teranishi, Jeffrey S. Vetter, and Marc Gonzalez Tallada, sKokkos: Enabling Kokkos with Transparent Device Selection on Heterogeneous Systems using OpenACC. HPC Asia 2024: 23-34. doi:10.1145/3635035.3635043