Value-Compressed Sparse Column (VCSC): �Sparse Matrix Storage for Single-cell Omics Data
Big Data 2024 HPC-BOD Workshop
Seth Wolfgang, Skyler Ruiter, Marc Tunnel, Timothy Triche Jr., Erin Carrier, Zachary DeBruine (debruinz@gvsu.edu)
Sparse matrix compression formats do not leverage redundancy of non-zeros to increase compression ratios
Conventional coordinate-based storage of sparse data
Adding value-based compression to CSC
Additional compression of indices with bytepacking
A measure of redundancy to quantify compression capability
We define redundancy of the ith column as:
This metric captures the magnitude of the difference between the number of non-zero elements and the number of unique elements.
Unlike CSC, VCSC and IVCSC compress as a function of both sparsity and value-wise redundancy
Performance of CSC, VCSC, and IVCSC compression on randomly generated redundant sparse matrices
Performance of CSC, VCSC, and IVCSC compression on real sparse matrices
Performance of CSC, VCSC, and IVCSC �compression on single-cell transcriptomics datasets
Performance of CSC, VCSC, and IVCSC constructor and iterator
Performance of BLAS routines with CSC, VCSC, and IVCSC
VCSC and IVCSC enable in-core modeling of large single-cell datasets
Conclusion
Value-Compressed Sparse Column (VCSC): �Sparse Matrix Storage for Single-cell Omics Data
Big Data 2024 HPC-BOD Workshop
Seth Wolfgang, Skyler Ruiter, Marc Tunnel, Timothy Triche Jr., Erin Carrier, Zachary DeBruine (debruinz@gvsu.edu)