Inferring regulatory networks from scRNA-seq datasets
Oct 5th 2021
BMI 826-23 Computational Network Biology�Fall 2021
Sushmita Roy
Plan for this section
Plan for today
scRNA-seq: a powerful technology to understand heterogeneous cell populations
Bulk
Single-cell
Figures: 10X Genomics; Shalek AK et al., Nature 2014
Computational problems with scRNA-seq data
Inferring networks from scRNA-seq data
Pseudotime
Bergen et al, 2021
If we can think of each cell to capture a snapshot of a dynamic process, can we order cells based on their transcriptional signatures?
Classes of network inference algorithms
Adapted from Babtie et al., Current Opinions in Systems Biology 2017, Stone et al unpublished
How to model an edge?
Pseudo time?
Plan for today
PIDC
A few information theoretic concepts
RECAP: Mutual Information
RECAP: Conditional mutual information
Interaction information
Conditional mutual information
Mutual information
Partial Information Decomposition
Partial Information Decomposition
Redundancy
Unique information
Synergistic information
Unique information can discriminate between edges and non-edges
Using PID for network inference
PIDC on simulated data
Application of PIDC to single cell data
Plan for today
SCENIC
SCENIC workflow
Use GENIE3 to predict target genes of each regulator.
These are likely co-expressed
Use RcisTarget to infer which targets have a binding motif support. Such filtered targets are called regulons
Use AUCcell to assess how active a regulon is in a cell.
Evaluation of SCENIC
Application of SCENIC on a dataset with known cell labels
SCENIC accurately predicts regulators for known cell types. Clustering of cell types is more accurate when using SCENIC regulons
Plan for today
Benchmarking single-cell network inference algorithms
A. Pratapa, A. P. Jalihal, J. N. Law, A. Bharadwaj, and T. M. Murali, “Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data,” Nat Methods, vol. 17, no. 2, pp. 147–154, Feb. 2020, doi: 10.1038/s41592-019-0690-6.
Performance of algorithms on simulated data
Different network topologies
Number of genes: 7-18
Performance on real data
Each number is the Early Precision Ratio. Higher the better.
Overall findings from benchmarking
PIDC or GENIE3 might be good algorithms for scRNA-seq network inference
Take away points