Researchers from AMCR replaced the traditional 2-sided inter-process MPI within SuperLU’s sparse triangular solvers with a low-latency 1-sided implementation.
Significance and Impact
SuperLU preconditioners (sparse triangular
solvers / SpTS) are essential components
in the linear solvers used by the M3D-C1 and NIMROD fusion simulation codes but are dominated by MPI communication. Our new one-sided CPU implementation of SpTS improves solver performance and scalability helping fusion simulations.
Research Details
Collaboration between CTTS, RAPIDS, and FastMath.
Evaluated potentials of three different one-sided MPI implementations against the baseline two-sided implementation on NERSC’s Perlmutter’s EPYC CPUs.
Achieved up to 1.5x speedup on 128 processes (one Perlmutter CPU node) relative to two-sided implementation.
Integrated best one-sided CPU SpTS implementation into SuperLU_DIST
Leveraging One-Sided Communication for Sparse Triangular Solvers on traditional CPUs�SciDAC4: CTTS