Graph alignment: Applications to scRNA-seq data integration
Nov 5th 2024
BMI/CS 775 Computational Network Biology�Fall 2024
Sushmita Roy
Plan for this section
Applications of network alignment
Alignment of scRNA-seq datasets
Alignment of molecular networks
Goals for today
Single cell omics
Slide credit: 10x genomics
A single cell RNA-seq dataset
scRNA-seq dataset
genes (6k-20k)
cells (5k-1million)
Computational problems with scRNA-seq data
Computational tools for single cell omic datasets
Zappia, L. & Theis, F. J. Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol 22, 301 (2021).
Flavors of data integration
Overall Problem Definition
What makes integration of scRNA-seq datasets difficult?
Aligning high-dimensional datasets
Adapted from “Manifold alignment”, Wang et al 2010
Common approach to aligning scRNA-seq datasets
Goals for today
Batch effect correction of scRNA-seq data using mutual nearest neighbors
MNN for batch correction
Comparing MNN to other methods: simulated dataset
Comparing MNN to other methods: hematopoiesis differentiation dataset
Goals for today
Non-negative matrix factorization
Minimize
Lee and Seung Adv. Neur. In. 2001
•
Slide credit Erika Da-Inn Lee
Cells
Genes
H
E
W
s.t, H>=0, W>=0
Using NMF factors for clustering
Cells
H
W
Applying NMF to a single cell RNA-seq dataset
k ≪ n, m
X = ℝn×m
H = ℝn×k
W = ℝk×m
n cells
k factors
H
X
W
HW
m genes
k factors
n cells
m genes
Original value matrix
Predicted matrix
U
Factorized cell-side matrix
Cell clusters
Extensions to NMF
Joint NMF
E1
H1
H2
W
genes
W
E2
=
cells
cells
Integrative NMF
X
+
genes
X
+
X
+
W
cells
cells
cells
W
W
E1
H1
H2
E2
E3
H3
V1
V2
V3
LIGER
LIGER key steps
LIGER: Defining/refining cell clusters
Benchmarking LIGER
Applying LIGER to integrate multiple datasets
Cell clusters
Gene markers
Donor-specific and shared genes
Using LIGER to integrate scRNA-seq and spatial transcriptomics data
Distribution of gene expression per cell between two platforms
scRNA-seq: 71,000 cells
spatial: 2500 cells
Using LIGER to integrate scRNA-seq and spatial transcriptomics data
Spatial location of cell clusters
scRNA-seq
Spatial
Summary of algorithms
Algorithm | Dimensionality reduction technique | Graph creation | Cell-clustering |
MNN | PCA (optional) | Mutual nearest neighbor | |
SCANORAMA | SVD | Mutual nearest neighbor on factor space | Kmeans |
LIGER | iNMF | NMF+Shared neighborhood | Louvain/Leiden |
SEURAT | CCA | k nearest neighbor | Louvain |
Take away points
References
Singular Value Decomposition
By Cmglee - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=67853297
Canonical Correlation Analysis
Canonical correlation analysis