1 of 1

Scaling Millions of Quantum Chemical Calculations

Scientific Achievement

We present a high-performance, scalable, ensemble management framework for performing data-intensive quantum chemical electronic structure calculations for organic molecules. This framework provides abstractions to plug different first-principles-based semi-empirical methods and executes them efficiently at large scale on HPC systems.

Significance and Impact

The ensemble framework was used to generate four publicly available molecular datasets that contain first-principles calculations for up to 10 million organic molecules. These datasets are used to train neural networks on the Frontier and Perlmutter supercomputers for predicting molecular properties.

The figure shows the architecture of the workflow management framework for running large ensembles of molecular calculations that generate a large number of files. First-principle (FP) calculations are dynamically distributed among the CPU cores to efficiently utilize compute resources. The framework's ‘Data Plane’ transparently creates a separate staging area on the compute node for every FP calculation. All output files are automatically redirected to the faster staging area, and final files are copied to the slower, shared, parallel file system (PFS) upon completion. This greatly reduces the overhead on the PFS, thereby allowing the processing of multiple thousands of molecules concurrently. The framework required no modifications to the FP applications used in this work.

Technical Approach

  • The framework combines dynamic task distribution with efficient data management techniques to manage large data
  • Scientists can easily plug new methods and run the ensemble workflow at scale

PI(s)/Facility Lead(s): Scott Klasky, ORNL

Collaborating Institutions: Oak Ridge National Laboratory

ASCR Program: SciDAC RAPIDS2

ASCR PM: Kalyan Perumalla

Publication(s) for this work: [Accepted for publication] Kshitij Mehta et. al. “Scaling Ensembles of Data-Intensive Quantum Chemical Calculations for Millions of Molecules”, IPDPSW 2024

Datasets: [1] 10.13139/OLCF/1890227, [2] 10.13139/OLCF/1907919, [3] 10.13139/OLCF/2318314, [4] 10.13139/OLCF/2318313