1 of 1

GPU Accelerated Sparse Cholesky Factorization

1

Scientific Achievement

To solve sparse symmetric positive definite linear systems, we use Cholesky factorization of the coefficient matrix with a right looking approach and the resulting triangular factors are used to compute the solution. We investigate techniques for reducing the factorization time in sparse Cholesky factorization by offloading some of the dense matrix operations on a GPU. We achieved up to 4x speedup compared to the CPU-only version.

Significance and Impact

The solution of sparse symmetric positive definite linear systems is an important computational kernel in large-scale scientific and engineering modeling and simulation. Achieving speedup on Cholesky factorization which is intensive part of this kernel would have significant impact.

The performance profile for both CPU and GPU versions of the RL and RLB, which shows the relative performance of the compared methods. For a given method, 1-p(𝜏) is the fraction of test problems that the method is slower than the fastest solver by a factor of 𝜏. GPU version of the RL is unequivocally the best, except for one data for which RL cannot compute the factorization. RLB closely follows RL. Both RL and RLB utilizing GPU for BLAS calls with large data are much better than their CPU only versions.

Technical Approach

  • Using a new efficient supernode reordering technique, we maximize the size of the dense submatrices in the supernodes without increasing fill
  • We transfer supernodes between the CPU and GPU as a whole in order to minimize the number of data transfers which is known to be slow
  • We utilize efficient level-3 BLAS operations on the GPU for the dense parts of the supernodes

PI(s)/Facility Lead(s): Esmond NG; LBNL POC

Collaborating Institutions: Dalton State College

ASCR Program: FASTMath SciDAC Institute

ASCR PM: Steven Lee

Publication(s) for this work:

M. O. Karsavuran, E. G. Ng and B. W. Peyton, "GPU Accelerated Sparse Cholesky Factorization," SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 2024, pp. 703-707, doi: 10.1109/SCW63240.2024.00098.