1 of 1

Accelerating Large-Scale Atomistic td-DFTB Simulations Using GPU Offloading�With the DECODE Partnership (BES)

Scientific Achievement

Significance and Impact

Technical Approach

The computational time for a single time step is shown for four stages of the acceleration process. In the initial build, there is no GPU acceleration and the computation is dominated by slow matrix multiplication. A basic GPU offload radically speeds up the computation of the matrix multiplication but introduces some additional overhead. Optimizing this offloading reduces the overhead and results in a speed-up by a factor of 10 over the baseline implementation.

We accelerated time-dependent density functional tight binding (td-DFTB) simulations — a method used to study the electronic structure and properties of molecules and materials — by a factor of 10. Our new implementation efficiently leverages GPU resources to run simulations of large condensed matter systems containing thousands of atoms with favorable computational scaling as a function of system size.

PI(s)/Facility Lead(s): Mauro Del Ben (LBNL), Khaled Ibrahim (LBNL), Lenny Oliker (LBNL)

Collaborating Institutions: UC Riverside

ASCR Program: SciDAC

ASCR PM: Hal Finkel

Publication(s) for this work: Qiang Xu et al., “Velocity-Gauge Real-Time Time-Dependent Density Functional Tight-Binding for Large-Scale Condensed Matter Systems”, J. Chem. Theory Comput. 2023, 19, 22, 7989–7997, https://doi.org/10.1021/acs.jctc.3c00689

Fully offload time propagation of electronic density matrix to the GPU
Distribute large matrices over multiple processors using MPI to allow for scalability

Despite its broad applicability, the high computational expense of standard time-dependent density functional theory prohibits its use for large systems. The computational efficiency of our td-DFTB implementation enables the simulation of electron dynamics of complex systems that are too large to handle otherwise, allowing researchers to investigate how complex systems like photoactive proteins or catalytic materials behave in their full chemical environments over long simulation windows.

�.

LOCAL POC: Mauro Del Ben (mdelben@lbl.gov) and Khaled Ibrahim (kzibrahim@lbl.gov)

TALKING POINTS:

By efficiently using the computational power of graphics processors (GPUs) we can speed up simulations of complex molecules and materials by over a factor of 10.
Furthermore, by distributing the data across many processors, we can simulate systems too large to be contained in the memory on a single processor.
This combined ability to quickly simulate very large systems will allow us to investigate how complex systems like photoactive proteins or catalytic materials behave in their full chemical environments over long simulation windows.
The figure (on right) shows how the computational cost of matrix multiplication (shown in blue) is reduced by GPU offloading, as compared to the initial code.

METADATA:

Name of the associated awarded project: DECODE

PI name(s): UC Riverside, Prof. Bryan Wong�Name of the program manager: ASCR: Hal Finkel

�

CITATIONS:

Qiang Xu et al., “Velocity-Gauge Real-Time Time-Dependent Density Functional Tight-Binding for Large-Scale Condensed Matter Systems”, J. Chem. Theory Comput. 2023, 19, 22, 7989–7997, https://doi.org/10.1021/acs.jctc.3c00689