Scaling Millions of Quantum Chemical Calculations
Scientific Achievement
We present a high-performance, scalable, ensemble management framework for performing data-intensive quantum chemical electronic structure calculations for organic molecules. This framework provides abstractions to plug different first-principles-based semi-empirical methods and executes them efficiently at large scale on HPC systems.
Significance and Impact
The ensemble framework was used to generate four publicly available molecular datasets that contain first-principles calculations for up to 10 million organic molecules. These datasets are used to train neural networks on the Frontier and Perlmutter supercomputers for predicting molecular properties.
The figure shows the architecture of the workflow management framework for running large ensembles of molecular calculations that generate a large number of files. First-principle (FP) calculations are dynamically distributed among the CPU cores to efficiently utilize compute resources. The framework's ‘Data Plane’ transparently creates a separate staging area on the compute node for every FP calculation. All output files are automatically redirected to the faster staging area, and final files are copied to the slower, shared, parallel file system (PFS) upon completion. This greatly reduces the overhead on the PFS, thereby allowing the processing of multiple thousands of molecules concurrently. The framework required no modifications to the FP applications used in this work.
Technical Approach
PI(s)/Facility Lead(s): Scott Klasky, ORNL
Collaborating Institutions: Oak Ridge National Laboratory
ASCR Program: SciDAC RAPIDS2
ASCR PM: Kalyan Perumalla
Publication(s) for this work: [Accepted for publication] Kshitij Mehta et. al. “Scaling Ensembles of Data-Intensive Quantum Chemical Calculations for Millions of Molecules”, IPDPSW 2024
Datasets: [1] 10.13139/OLCF/1890227, [2] 10.13139/OLCF/1907919, [3] 10.13139/OLCF/2318314, [4] 10.13139/OLCF/2318313