1 of 1

Porting and Tuning HBPS Applications with TAU and APEX�With the HBPS Partnership (SciDAC), PI: CS Chang

1

Scientific Achievement

TAU and APEX’s scalable performance & visualization tools from The University of Oregon OACISS Institute help High-Fidelity Boundary Plasma Simulation (HBPS) scientists achieve performance portability on the latest generation of DOE leadership-class systems.

Significance and Impact

TAU’s performance measurement and analysis is helping XGC and GENE developers design new algorithms and parallelization strategies to fully utilize GPUs on Perlmutter and Polaris as well as on Frontier and Aurora development machines. TAU and APEX are also helping the HBPS team to improve OpenMP performance on CPUs. Improved computational efficiency and reduced memory usage allow for explorations of larger fusion energy problems with fewer compute nodes.

.

Figure: Scaling performance and timer breakdown of the XGC simulation on Perlmutter (NERSC) relative to Summit (OLCF), using the same number of GPUs (roughly 2.1x faster). Because electromagnetic simulations are a science priority going forward, the electron push kernel is less time dominant since it is sub-cycled fewer times. Scaling performance on Polaris (ALCF) is similar to Perlmutter. Credit: A. Scheinberg, Jubilee Development

Technical Approach

  • Measured, analyzed and tuned XGC computational kernels for AMD GPUs as well as NVIDIA GPUs using TAU and APEX.
  • Helped tune XGC OpenMP regions to replace atomic operations with faster reductions.
  • APEX is enabling portable comparisons of the GENE simulation to fully utilize new systems utilizing NVIDIA, AMD, or Intel GPUs.
  • TAU and APEX collected hardware counters to generate roofline performance metrics on new AMD GPUs prior to release of available vendor tools, leading to increased computational intensity and shorter time to solution.