Mashy Green & Ilektra Christidi
RSE HPC SIG Online Meetup, 19 May 2025
Custom Acceleration Frameworks:�The good, the bad, and the ugly
A case study of our experience as RSEs on a project that uses a Custom Acceleration Framework (CAF)
Software options for offloading
Framework | Pros | Cons |
Native language (cuda/hip) | Full control, maximum acceleration possible | Difficult (different language), non-portable |
Custom | Optimal performance and abstraction balance for the specific codebase | Lots of work, code-specific, maintenance is up to the application developers |
Pragma API (OpenACC, OpenMP) | Straightforward, familiar, intuitive, “fully” portable | Features available and performance depends on compiler maturity |
Third party (kokkos, raja, SYCL) | Portable, higher abstraction compared to native | Performance not optimal for all codes |
Note: not an endorsement/rejection of any of the options, there are valid cases for all of them.
Grid’s CAF
Grid’s CAF
Grid’s CAF
Memory manager
Grid’s CAF
The Good
Low level optimisation abstracted away from user
The Good
Data types and operators already optimised
The Bad
The Bad
Where is staple (dSdU_mu) allocated and zeroed?
Before
The Bad
Where is staple (dSdU_mu) allocated and zeroed?
Before
After
The Ugly
Profile/Debug this:
The Ugly
Profile/Debug this:
The Ugly
Profile/Debug this:
The Ugly
Profile/Debug this:
The Ugly
Profile/Debug this:
Conclusions/Recommendations
When designing a CAF