Performance Portability of Sparse Computational Methods on GPU-Accelerated Architectures
Scientific Achievement
Developed an algorithm-centric performance portability metric for multi-model approaches for performance portability for DOE GPU-accelerated HPC architectures. Demonstrated effectiveness for sparse matrix multi-vector solvers.
Significance and Impact
The developed metric and methodology for evaluating performance portability provide a better guide to algorithmic selection than programming-model centric approaches for algorithms influences by the dataset content, such as sparse matrix methods. Furthermore, a broad range of computational patterns can leverage this methodology.
Research Details
This research studied a wide set of algorithmic and programming models for sparse matrix multi-vector implementations, including vendor-specific programming models (CUDA, and HIP), and portability programming models (Kokkos, OpenMP and OpenACC) using 5 algorithmic variants.
The study shows that achieving portability depends on the feasibility of expressing an algorithmic variant on each of the programming models.
We developed an algorithm-centric portability metric that enables the evaluation of multi-programming model approaches.
Performance portability through a portability models vs. multi-backend approach.
SciDAC-5 RAPIDS-FastMath
Khaled Z. Ibrahim, Chao Yang, Pieter Maris. Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs, 2022 International Workshop on Performance, Portability and Productivity in HPC (P3HPC), SC22.
Kokkos performance portability metric varies significantly with problem configuration.
Multi-programming model approaches can improve the observed performance portability across a wide range of configurations.