We optimized the collective communication performance within a nuclear quantum many-body calculation code called MFDn (many-fermion dynamics for nucleons) using two-sided point-to-point and one-sided MPI. Using 30 Perlmutter GPU nodes, we achieved a speedup of 7x.
Significance and Impact
The optimized collective communication significantly improves the overall scalability of MFDn on DOE leadership class high performance computers such as the Perlmutter at NERSC, and enables physicists to study a variety of properties of light nuclei with high fidelity.
Time cost of 100 LANCZOS iterations using MPI collectives, MPI point-to-point and one-sided MPI. The MPI point-to-point can achieve 7x using 30 Perlmutter GPU nodes. MPI one-sided time cost is slightly higher than point-to-point because there is a pair of MPI_Win_Fence at each messaging time, which introduces extra barrier time.
PI(s)/Facility Lead(s): Lenny Oliker (LBL)
Collaborating Institutions: Iowa State University
ASCR Program: SciDAC RAPIDS2, FASTMath
ASCR PM: Kalyan Perumalla (SciDAC RAPIDS2), Steve Lee (FASTMath)
Technical Approach
We use point-to-point communication and one-sided MPI to accelerate MFDn at scales
We reduce the load imbalance among processes
We explore the potential of different communication paradigms.