2 of 14

Brief History and Overview of VecGeom

Started from R&D effort to unify geometry algorithms

Pick best algorithms, optimize, maintain long term
Support for multi-track parallelism via SIMD
Support for GPUs via CUDA

Uses Constructive Solid Geometry (CSG) to represent solids

Support for many different primitives, each with its own algorithms
Support for complex boolean combinations of primitive shape types

Geant4 Solids

1995

ROOT

TGeo

2002

AIDA USolids

2010

VecGeom

AIDA 2020

2013

Dec 2017

May 2021

May 2022

VecGeom

v0.5

VecGeom

v1.1.14

VecGeom

v1.2.0

VecGeom

v1.1.16

Jun 2021

BVH on CPU

BVH on GPU

Latest Release

R&D

Production, adoption by experiments

Jun 2018

VecGeom

v1.0

GDML Reader

3 of 14

Brief History and Overview of VecGeom

Started from R&D effort to unify geometry algorithms

Pick best algorithms, optimize, maintain long term
Support for multi-track parallelism via SIMD
Support for GPUs via CUDA

Uses Constructive Solid Geometry (CSG) to represent solids

Support for many different primitives, each with its own algorithms
Support for complex boolean combinations of primitive shape types

Commit activity

4 of 14

VecGeom’s Backend Concept

using VectorBackend = vecCore::backend::VcVectorT<Precision>;

using Real_v = vecgeom::VectorBackend::Real_v;

Implementation<Real_v>

5 of 14

Recent improvements for support on GPUs

CMake build system improvements for CUDA (details in another talk)
Moved to C++17 for both C++ and CUDA code, depends on CUDA 11.X
Global navigation index on GPU with cached transformation matrices
Support for single-precision with many fixes in volumes and navigation
Bulk copy of volume information to GPU to speedup initialization

Geometry initialization on the GPU roughly 90% faster than before

Bounding Volume Hierarchy (BVH) acceleration support on GPUs

Improved Surface Area Heuristic (SAH) algorithm for BVH construction

Optimizations to reduce register usage in GPU kernels

6 of 14

Global Navigation Index

Large memory footprint for storing the geometry state per ray/track

Need to be able to handle O(10⁶) tracks in flight on the device
Need to store full path of placed volume indices (0,1,3)
Large overhead for complex (deeply nested) setups

In most cases, large memory benefit obtained �if storing all touchable information in a table�and indexing using a 32-bit integer

Optional caching for global transformations

P_World

P_{A_0}

P_{A_1}

P_{A_2}

P_{B_0}

P_{C_0}

P_{B_0}

P_{C_0}

P_{B_0}

P_{C_0}

7 of 14

Transformation Matrix Caching

Depth Cached

Size [MB]

Less than 250 MB for full caching

Caching transformation matrices avoids having to recompute them during navigation, improving run time performance.

8 of 14

Support for single-precision in VecGeom

Tested impact of single-precision on performance in AdePT examples

After several fixes for importing/navigating VecGeom geometries in single-precision

LoopNavigator (simple looper for children), and BVHNavigator

RaytraceBenchmark example (using BVHNavigator) on RTX 2080 super

Reading a GDML file and modeling reflections/refractions and specularity
Validated by the output image

Very simple geometry: ~ 7.5% speedup
Complex geometry (trackML): ~ 44% GPU, ~ 13.6% CPU!

Physics-enabled GPU examples

Example 9 + trackML + LoopNavigator: ~ 2.8x speedup
Example 11 + trackML + BVHNavigator: ~ 30% speedup

9 of 14

Bounding Volume Hierarchy Acceleration

First acceleration structure with support for both CPU and GPU
Surface Area Heuristic (SAH) algorithm for better quality BVH construction added later, taking into account specific corner cases in HEP geometries
Performance similar to other CPU navigators (~25% faster in scalar mode, ~20% slower in vector mode)

Track ML Geometry

10 of 14

Surface Area Heuristic (SAH) Algorithm

Split along longest axis, no subdivisions.

SAH now correctly computes subdivisions.

11 of 14

Bounding Volume Hierarchy CPU Performance

test/XRayBenchmarkFromROOTFile test/cms2015.root MUON y 500

12 of 14

Bounding Volume Hierarchy GPU Performance

Speedup of AdePT simulation of 50 primary electrons at 100 GeV using example 9 (no BVH) and example 11 (BVH) for TrackML geometry, and 1000 primary electrons at 10 GeV using example 13 for CMS geometry (replacing navigators for each run). TrackML uses very few solid types (hence has lower thread divergence on the GPU) compared to CMS. TrackML also has more volumes with many children, which benefit more from BVH acceleration. This translates into a much bigger speedup from BVH acceleration for TrackML geometry than for CMS. Energy deposition and number of secondaries are nearly the same for examples 9 and 11, and between runs of example 13 with/without BVH.

13 of 14

Remarks on GPU Performance

Performance has improved significantly on GPU with BVH acceleration

Speedup of ~34.6x for TrackML geometry (low thread-divergence, many children per volume)
Speedup of ~3.4x for CMS simulation (high thread-divergence, fewer children per volume)

Geometry model using CSG not well suited for GPUs

Many primitive types ⇒ high thread divergence when computing geometry queries

High register usage due to virtual functions, recursion

Limits number of threads that can fit into each GPU SM
Memory that cannot fit into registers (e.g. stack memory) spills to global memory
Lower performance when forcing lower number of registers per threads

14 of 14

Conclusions

VecGeom offers most of the required functionality needed on GPU

Most solid types work on GPU (with a few exceptions)
Accelerated navigation on CPU/GPU already available

Need an alternative to CSG representation to improve performance

Performance depends strongly on number/complexity of solid types used
New model using surfaces under consideration (more details in another talk)

Some quirks remain

VecGeom is CUDA-only, so not portable to AMD/Intel GPUs
High memory footprint and thread divergence on GPU limits performance
Double compilation of host code causes confusion, especially for new users