1 of 14

VecGeom Library

VecGeom Developers 4 May 2022

2 of 14

Brief History and Overview of VecGeom

  • Started from R&D effort to unify geometry algorithms
    • Pick best algorithms, optimize, maintain long term
    • Support for multi-track parallelism via SIMD
    • Support for GPUs via CUDA
  • Uses Constructive Solid Geometry (CSG) to represent solids
    • Support for many different primitives, each with its own algorithms
    • Support for complex boolean combinations of primitive shape types

2

Geant4 Solids

1995

ROOT

TGeo

2002

AIDA USolids

2010

VecGeom

AIDA 2020

2013

Dec 2017

May 2021

May 2022

VecGeom

v0.5

VecGeom

v1.1.14

VecGeom

v1.2.0

VecGeom

v1.1.16

Jun 2021

BVH on CPU

BVH on GPU

Latest Release

R&D

Production, adoption by experiments

Jun 2018

VecGeom

v1.0

GDML Reader

3 of 14

Brief History and Overview of VecGeom

  • Started from R&D effort to unify geometry algorithms
    • Pick best algorithms, optimize, maintain long term
    • Support for multi-track parallelism via SIMD
    • Support for GPUs via CUDA
  • Uses Constructive Solid Geometry (CSG) to represent solids
    • Support for many different primitives, each with its own algorithms
    • Support for complex boolean combinations of primitive shape types

3

Commit activity

4 of 14

VecGeom’s Backend Concept

4

using VectorBackend = vecCore::backend::VcVectorT<Precision>;

using Real_v = vecgeom::VectorBackend::Real_v;

Implementation<Real_v>

5 of 14

Recent improvements for support on GPUs

  • CMake build system improvements for CUDA (details in another talk)
  • Moved to C++17 for both C++ and CUDA code, depends on CUDA 11.X
  • Global navigation index on GPU with cached transformation matrices
  • Support for single-precision with many fixes in volumes and navigation
  • Bulk copy of volume information to GPU to speedup initialization
    • Geometry initialization on the GPU roughly 90% faster than before
  • Bounding Volume Hierarchy (BVH) acceleration support on GPUs
    • Improved Surface Area Heuristic (SAH) algorithm for BVH construction
  • Optimizations to reduce register usage in GPU kernels

5

6 of 14

Global Navigation Index

  • Large memory footprint for storing the geometry state per ray/track
    • Need to be able to handle O(106) tracks in flight on the device
    • Need to store full path of placed volume indices (0,1,3)
    • Large overhead for complex (deeply nested) setups
  • In most cases, large memory benefit obtained �if storing all touchable information in a table�and indexing using a 32-bit integer
    • Optional caching for global transformations

6

PWorld

PA_0

PA_1

PA_2

PB_0

PC_0

PB_0

PC_0

PB_0

PC_0

0

1

2

3

2

2

3

3

4

5

7 of 14

Transformation Matrix Caching

7

Depth Cached

Size [MB]

Less than 250 MB for full caching

Caching transformation matrices avoids having to recompute them during navigation, improving run time performance.

8 of 14

Support for single-precision in VecGeom

  • Tested impact of single-precision on performance in AdePT examples
    • After several fixes for importing/navigating VecGeom geometries in single-precision
      • LoopNavigator (simple looper for children), and BVHNavigator
  • RaytraceBenchmark example (using BVHNavigator) on RTX 2080 super
    • Reading a GDML file and modeling reflections/refractions and specularity
    • Validated by the output image
      • Very simple geometry: ~ 7.5% speedup
      • Complex geometry (trackML): ~ 44% GPU, ~ 13.6% CPU!
  • Physics-enabled GPU examples
    • Example 9 + trackML + LoopNavigator: ~ 2.8x speedup
    • Example 11 + trackML + BVHNavigator: ~ 30% speedup

8

9 of 14

Bounding Volume Hierarchy Acceleration

  • First acceleration structure with support for both CPU and GPU
  • Surface Area Heuristic (SAH) algorithm for better quality BVH construction added later, taking into account specific corner cases in HEP geometries
  • Performance similar to other CPU navigators (~25% faster in scalar mode, ~20% slower in vector mode)

9

Track ML Geometry

10 of 14

Surface Area Heuristic (SAH) Algorithm

10

Split along longest axis, no subdivisions.

SAH now correctly computes subdivisions.

11 of 14

Bounding Volume Hierarchy CPU Performance

11

test/XRayBenchmarkFromROOTFile test/cms2015.root MUON y 500

12 of 14

Bounding Volume Hierarchy GPU Performance

12

Speedup of AdePT simulation of 50 primary electrons at 100 GeV using example 9 (no BVH) and example 11 (BVH) for TrackML geometry, and 1000 primary electrons at 10 GeV using example 13 for CMS geometry (replacing navigators for each run). TrackML uses very few solid types (hence has lower thread divergence on the GPU) compared to CMS. TrackML also has more volumes with many children, which benefit more from BVH acceleration. This translates into a much bigger speedup from BVH acceleration for TrackML geometry than for CMS. Energy deposition and number of secondaries are nearly the same for examples 9 and 11, and between runs of example 13 with/without BVH.

13 of 14

Remarks on GPU Performance

  • Performance has improved significantly on GPU with BVH acceleration
    • Speedup of ~34.6x for TrackML geometry (low thread-divergence, many children per volume)
    • Speedup of ~3.4x for CMS simulation (high thread-divergence, fewer children per volume)
  • Geometry model using CSG not well suited for GPUs
    • Many primitive types ⇒ high thread divergence when computing geometry queries
  • High register usage due to virtual functions, recursion
    • Limits number of threads that can fit into each GPU SM
    • Memory that cannot fit into registers (e.g. stack memory) spills to global memory
    • Lower performance when forcing lower number of registers per threads

13

14 of 14

Conclusions

  • VecGeom offers most of the required functionality needed on GPU
    • Most solid types work on GPU (with a few exceptions)
    • Accelerated navigation on CPU/GPU already available
  • Need an alternative to CSG representation to improve performance
    • Performance depends strongly on number/complexity of solid types used
    • New model using surfaces under consideration (more details in another talk)
  • Some quirks remain
    • VecGeom is CUDA-only, so not portable to AMD/Intel GPUs
    • High memory footprint and thread divergence on GPU limits performance
    • Double compilation of host code causes confusion, especially for new users

14