1 of 5

Science of Software Development

a summary of our position at SSSD-U, 2021

Ganesh Gopalakrishnan, with

members of the X-Stack Project ComPort

https://xstack-fp.github.io/

2 of 5

Summary Position

  • Basic correctness-checking tools are lacking for accelerators
  • Need community effort

3 of 5

Science of Software: Need for Concurrency Analyzers

  • Need for tools to check for Concurrency Errors
    • Rising use of accelerators
      • GPUs power HPC and ML
      • GPUs are also central to encryption schemes
      • It is easy to introduce data races into high-performance GPU codes
      • YET
        • There isn't a data race checker for GPUs (source: discussions with academic experts and lab researchers; plus own experience)
          • Tools like CUDA-Memcheck do not check for global memory races
          • Academic tools in this space do not last (fast-moving support technologies)
  • The community has to come together to create basic concurrency correctness checking tools

4 of 5

Science of Software: Need for Numerical Analyzers

  • Need for tools to check for Numerical Errors
    • Rising use of accelerators
      • GPUs power HPC and ML
      • GPUs are also central to encryption schemes
      • It is easy to introduce exceptional numbers into GPU computations
        • NaNs, Overflows, …
        • NaNs may well be flowing into HPC and ML codes today
          • Insights from the results of FPChecker (Laguna, LLNL)
      • YET
        • There aren't tools that can look for such basic numerical errors
        • Those tools that exist are limited by what CUDA subsets are supported
          • Tools for non-Nvidia GPUs can help, but aren't present
  • The community has to come together to create basic numerical correctness checking tools

5 of 5

Some Areas to Emphasize

  • Formal Methods
    • Essential for "Science"
  • Testing, including Fuzz Testing
    • Relevant coverage
  • Testing AI Software; AI Software for Testing
    • Two-way street
  • Correctness via Performance Portability Layers
    • Opportunities for creating abstractions that bridge concurrency models
  • Timeliness
    • Old SW Correctness Approaches Need Major Redo / Rethink