1 of 23

VTK-M

VISUALIZATION FOR THE EXASCALE ERA AND BEYOND

SAND2023-07725C

THE PREMIER CONFERENCE & EXHIBITION ON COMPUTER GRAPHICS & INTERACTIVE TECHNIQUES

© 2023 SIGGRAPH. ALL RIGHTS RESERVED.

2 of 23

ACKNOWLEDGEMENTS

  • This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Award Numbers 10-014707, 12-015215, and 14-017566.
  • This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative.
  • Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.
  • Thanks to many, many partners in labs, universities, and industry.

2

3 of 23

TERMINOLOGY

  • FLOP – Floating Point Operations per Second
  • teraflop – 1012 floating point operations per seconds
  • petaflop – 1015 FLOPS
  • exaflop – 1018 FLOPS
  • Exascale – A system capable of achieving an exaflop (measured by the HPLinpack benchmark)

  • For Comparison:
    • Current generation AMD/Intel CPUs – 100s of gigaflops
    • Current generation AMD/NVidia GPUs - ~10s - 100 teraflops

3

4 of 23

DOE EXASCALE CLASS COMPUTING SYSTEMS

AURORA (ARGONNE)

FRONTIER (OAK RIDGE)

4

  • 9472 AMD Epyc CPUs (>600,000 cores)
  • 37,888 AMD MI250X GPUs (> 8.3 million cores)
  • 74 Cabinets; Power consumption 21 MW
  • World’s first exascale computer
    • 1.1 exaflops sustained
    • 1.67 exaflops peak
  • 21,248 Intel Xeon CPU Max Series
  • 63,744 Intel Data Center GPU Max Series
  • Theoretical Peak: > 2 exaflops
  • Fun facts:
    • 300 miles of optical cable
    • 44,000 gallons of water for cooling
    • Weighs > 600 tons

5 of 23

ORIGINS OF VTK-M

  • Goals
    1. A single place for the visualization community to collaborate, contribute, and leverage massively threaded algorithms.
    2. Reduce the challenges of writing highly concurrent algorithms by using data parallel algorithms.
    3. Make it easier for simulation codes to take advantage these parallel visualization and analysis tasks on a wide range of current and next-generation hardware.

5

6 of 23

ORIGINS OF VTK-M

  • Combined the strengths of three different projects
    • Extreme-scale Analysis and Visualization Library (EAVL), ORNL
      • New mesh layouts, memory efficiency, parallel algorithms, zero-copy for In-Situ support
    • Data Analysis Toolkit for Extreme Scale (Dax), SNL
      • ParaView plugin, large volumes through streaming to threaded filters
    • Piston, LANL
      • Data-parallel algorithms for many/multi-core, in-situ focused

6

7 of 23

VTK-M ARCHITECTURE

7

Use

Develop

Research

Data Model

Filters

Worklets

Device Algorithms

Execution

Arrays

Piston

EAVL

Dax

EAVL

EAVL

Piston

Dax

Dax

Piston

Dax

EAVL

8 of 23

8

Contour

Streams

Clip

Render

CUDA

Xeon Phi

CPU

AMD ROCm

Intel GPU

Surface

Normals

Ghost Cells

Warp

9 of 23

VTK-M INTERNALS

THE PREMIER CONFERENCE & EXHIBITION ON COMPUTER GRAPHICS & INTERACTIVE TECHNIQUES

© 2023 SIGGRAPH. ALL RIGHTS RESERVED.

10 of 23

MCD3 ARCHITECTURE

  • Meta-DPPs: which are parallel processing patterns that involve one or more DPPs. The word choice of “meta” is meant to evoke its definition of “denoting something of a higher or second-order kind.”
  • Convenience routines: which encapsulate common operations for scientific visualization.
  • DPPs: which provide parallel processing patterns.
  • Data management: which insulates algorithms from data layout complexities. These complexities range from how data is organized (e.g., structure-of-arrays vs array-of-structures) to different types of meshes (e.g., unstructured, rectilinear, etc.) to different memory spaces (e.g., host memory, device memory, or unified managed memory)
  • Devices: which enable code to run on a given hardware architecture.

10

K. Moreland, R. Maynard, D. Pugmire, A. Yenpure, A. Vacanti, M. Larsen, and H. Childs. Minimizing Development Costs for Efficient Many-Core Visualization Using MCD3. Parallel Computing, 108:102834, Dec. 2021.

11 of 23

MAP FIELD

11

Functor

12 of 23

POINT NEIGHBORHOOD

12

Functor

13 of 23

REDUCE BY KEY

13

Functor

14 of 23

VISIT POINT WITH CELLS

14

Functor

15 of 23

VISIT CELL WITH POINTS

15

Functor

16 of 23

DOES IT SCALE?

16

17 of 23

RECENT RESULTS

THE PREMIER CONFERENCE & EXHIBITION ON COMPUTER GRAPHICS & INTERACTIVE TECHNIQUES

© 2023 SIGGRAPH. ALL RIGHTS RESERVED.

18 of 23

ACCELERATING PARAVIEW

18

19 of 23

ACCELERATING POINCARÉ PLOTS

19

20 of 23

VOLUME RENDERING WITH SHADOWS

20

  • M. Mathai, M. Larsen, and H. Childs.  A Distributed-Memory Parallel Approach for Volume Rendering with Shadows.  To appear at the IEEE Symposium for Large Data Analysis and Visualization (LDAV), October 2023.

21 of 23

VOLUME RENDERING WITH SHADOWS

21

  • M. Mathai, M. Larsen, and H. Childs.  A Distributed-Memory Parallel Approach for Volume Rendering with Shadows.  To appear at the IEEE Symposium for Large Data Analysis and Visualization (LDAV), October 2023.

22 of 23

RENDERING AT SCALE ON FRONTIER

22

  • 80 trillion cells
  • 9400 nodes
  • 74,088 GPUs
  • Render Time: 300ms!

23 of 23

CONCLUSIONS

  • VTK-m has been proven that it can run at scale
  • VTK-m is currently the only software that is capable of visualizing simulations on exascale class systems
  • DoE is currently starting the process for the next set of supercomputers, based on AI/ML for physics
  • Currently, ParaView/VTK have ~10 filters with VTK-m overrides

  • Interested in working at the extreme-edge of computing and graphics?
    • Send a team member a resume, or apply to the intern programs at any of the labs

23