1 of 18

Guarding Numerics Amidst Rising Heterogeneity

a presentation at correctness-workshop.github.io/2021

by members of the ComPort project

https://xstack-fp.github.io/ (evolving website)

Ganesh Gopalakrishnan, Ignacio Laguna, Ang Li,

Pavel Panchekha, Cindy Rubio-Gonzalez, Zachary Tatlock

1,4: University of Utah, 2: LLNL, 3: PNNL, 4: University of California, Davis 5: University of Washington

ganesh@cs.utah.edu, ilaguna@llnl.gov, ang.li@pnnl.gov, pavpan@cs.utah.edu, crubio@ucdavis.edu, ztatlock@cs.washington.edu

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, ComPort: Rigorous Testing Methods to Safeguard Software Porting, under Award Numbers DE-SC0022252 (1,4), SCW 1743 (2), SCW 78284 (3), UCD# (5), UW# (6)

2 of 18

Rising Heterogeneity, Mixed-Precision

3 of 18

Rising Heterogeneity: Data Movement Reduction → Precision Reduction

4 of 18

Rising Heterogeneity

CPUs along with GPUs and custom accelerators in support of:

HPC and ML workloads

We focus on the consequences of GPU adoption

We first describe the broad spectrum of problems

Jointly compiled in our paper

Then the specifics of each problem
And how to solve them through community effort

5 of 18

GPUs: Moving/Evolving

serving different needs...

6 of 18

Challenges due to Increasing GPU/accelerator adoption

7 of 18

Need better formal error models.

Build trust in them outside of ML.

8 of 18

Need GPU Race Checkers; none available now. Dire need to develop.

Need better formal error models.

Build trust in them outside of ML.

9 of 18

Need better formal error models.

Build trust in them outside of ML.

Need GPU Race Checkers; none available now. Dire need to develop.

Nobody trusts or knows how brittle the code will get, and where. Need efficient analysis tools.

10 of 18

Need better formal error models.

Build trust in them outside of ML.

Need GPU Race Checkers; none available now. Dire need to develop.

Nobody trusts or knows how brittle the code will get, and where. Need efficient analysis tools.

HW Exceptions not reported by many GPUs (currently printfs). Develop

analysis tools.

11 of 18

Need better formal error models.

Build trust in them outside of ML.

Need GPU Race Checkers; none available now. Dire need to develop.

Nobody trusts or knows how brittle the code will get, and where. Need efficient analysis tools.

HW Exceptions not reported by many GPUs (currently printfs). Develop

analysis tools.

Closed-source compilers,

Moving Optimization Targets (SFU). Open-Source , Better Specs.

12 of 18

Need better formal error models.

Build trust in them outside of ML.

Need GPU Race Checkers; none available now. Dire need to develop.

Nobody trusts or knows how brittle the code will get, and where. Need efficient analysis tools.

HW Exceptions not reported by many GPUs (currently printfs). Develop

analysis tools.

Closed-source compilers,

Moving Optimization Targets (SFU). Open-Source , Better Specs.

Testing Objectives, Oracles, Fuzzing, Scalable Tracing. Open-Source Testing Tool Components to be Shared.

13 of 18

Need better formal error models.

Build trust in them outside of ML.

Need GPU Race Checkers; none available now. Dire need to develop.

Nobody trusts or knows how brittle the code will get, and where. Need efficient analysis tools.

HW Exceptions not reported by many GPUs (currently printfs). Develop

analysis tools.

Closed-source compilers,

Moving Optimization Targets (SFU). Open-Source , Better Specs.

Testing Objectives, Oracles, Fuzzing, Scalable Tracing. Open-Source Testing Tool Components to be Shared.

14 of 18

Challenges and Solutions (summary slide)

FP Formats, Formal Standards, Error Models

FMA, SFU, Tensor Cores

Exceptions

Develop techniques to detect at runtime or pre-analyze and prove absence

Schedule-dependency

Races, Reduction Order Dependence

Compiler optimizations

Performance-portability layers can provide a point to inject parametric solutions

Mixed precision

Not just flashy results but robust engineering, runtime dynamic-range exhaustion detection

Testing and Fuzzing

Need to specify interfaces, better goal-directed fuzzers

15 of 18

Proprietary Nature of GPUs is a reality

Nvidia is dominant

Good and bad

AMD and Intel on the rise -- but very little experience

Documented uses in HPC and ML hard to come by

16 of 18

Tool Landscape (will refine with your help)(other tools??)

FPSpy	OS-level insrum.	No GPU	Available
FPChecker (Laguna, ASE'20)	LLVM instrumentation	GPU (initial)	Available
Verificarlo / Verrou	Montecarlo Arithmetic	No GPU	Available
FPDebug, NSan (CC'21), FPSanitizer	Shadow Value	No GPU	Some available
Herbgrind	Valgrind instrum.	No GPU	Install issues
Saman (Nestor),	Modeling error (library based)	No GPU	Available
Ariadne	Exception triggering	No GPU	?
FLiT	Optimization bisection	No GPU	Available
Blossom, S3FP, FPGen	Guided fuzzing	No GPU	Available

17 of 18

Selected Numerical Issues, Solutions, Actionables

Issue	Problems, Where Experienced	Status of Solutions	Most Promising Research Needs
New Number Systems, Exception	No common notions of error	Hype has overshot usage, tools	Fix IEEE issues first; Automate through translation; Invest in education
Precision tuning	Code can become very brittle at places	Tools to check for blown precision budgets are unavailable	Need precision pressure-relief valves; Avoid Precision Fragmentation; Invest more in data compression ("bulk tuning")
Scalable Error Analysis	Many codes have have loops; No "one-size fits all"	Not all variables are alike (values, derivatives, FFT)	Domain-specific Error Definitions Appear Inevitable
Handling Compiler-Induced Variability	Made difficult by proprietary compilers	Very little progress; compilers don't know what a variable models	Insist on clear compiler specs; Optimize specific to problem semantics
Combined HPC and ML	Increasing in Uptake	Hardly any Tools to support SW Testing	Urgent creation of verification benchmarks; Get traction by pitching around Trustworthy AI

18 of 18

Concluding Remarks

Community Action

FPBench.org
X-Stack Project

Challenge Benchmark Creation

See one benchmark suite proposal at

https://docs.google.com/presentation/d/1b6bAImj_4xGKg8D7iW_fDMmdJODzlJA4/edit?usp=sharing&ouid=111495655245157297413&rtpof=true&sd=true

Need to whittle down, provide a graded series of challenges
Tools to forecast what will happen when precision/platform/formats are changed

Need tool standardization, avoid duplication of effort

Incentivize robust tool release, value real impact of tools on HPC codes

Change reward metrics!

Help from GPU vendors essential to stay abreast

Force hands during procurement -- not just for perf but also correctness tools!
Not just window-dressing but serious commitment

Best Practices to Mix or Change Precision
C++11 memory model adoption (amidst CUDA Atomics, older idioms)
Weaning users away from coding practices such as the "C volatile holy-water sprinkles"