1 of 34

Inferring regulatory networks from scRNA-seq datasets

Oct 5th 2021

BMI 826-23 Computational Network Biology�Fall 2021

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

2 of 34

Plan for this section

  • Overview of network inference (Sep 21st)
  • Bayesian networks (Sep 21st, Sep 23rd)
  • Dependency networks (Sep 23rd)
  • Integrative inference of gene regulatory networks (Sep 28th, Sep 30th)
  • Inference of regulatory networks from single cell datasets (Oct 5th)

3 of 34

Plan for today

  • Brief background into scRNA-seq
  • Algorithms for scRNA-seq
    • PIDC
    • SCENIC
  • Benchmarking scRNA-seq network inference algorithms

4 of 34

scRNA-seq: a powerful technology to understand heterogeneous cell populations​

Bulk

Single-cell

Figures: 10X Genomics; Shalek AK et al., Nature 2014

5 of 34

Computational problems with scRNA-seq data

  1. Pre-processing and normalization
  2. Visualization
  3. Cell type identification
  4. Trajectory inference:
    1. Single cell ordering: pseudo time
    2. Cell population structure relationships
  5. Network inference

6 of 34

Inferring networks from scRNA-seq data

  • Lots of measurements from a single sample
  • Natural cell-to-cell variation should be valuable for network inference

7 of 34

Pseudotime

Bergen et al, 2021

If we can think of each cell to capture a snapshot of a dynamic process, can we order cells based on their transcriptional signatures?

8 of 34

Classes of network inference algorithms

  • Information theoretic methods
    • PIDC
    • knnDREMI
    • SCRIBE
  • ODE based methods
    • SCODE
    • Ocone et al
  • Boolean models
    • BTR
  • Graphical models & Dependency networks
    • SCENIC
    • SCHiRM
    • HurdleNormal
    • SILGGM
  • Correlation-based methods
    • LEAP

Adapted from Babtie et al., Current Opinions in Systems Biology 2017, Stone et al unpublished

  • Incorporate pseudotime
    • LEAP
    • SCRIBE
    • SCODE
    • Ocone et al
    • SINGE
  • Don’t use pseudotime
    • knnDREMI
    • SCENIC
    • SCHiRM
    • HurdleNormal
    • SILGGM
    • PIDC

How to model an edge?

Pseudo time?

9 of 34

Plan for today

  • Brief background into scRNA-seq
  • Overview of algorithms for scRNA-seq
    • PIDC
    • SCENIC
  • Benchmarking scRNA-seq network inference algorithms

10 of 34

PIDC

  • T. E. Chan, M. P. H. Stumpf, and A. C. Babtie, “Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures.,” Cell systems, vol. 5, no. 3, Sep. 2017, [Online]. Available: http://view.ncbi.nlm.nih.gov/pubmed/28957658.
  • Based on information theory
    • Multivariate information (MVI)
    • Partial Information decomposition (PID)
  • Learns an undirected network by using PID to assess the information for three variables
    • This is better than mutual information (MI) based methods

11 of 34

A few information theoretic concepts

  • Mutual information

  • Conditional mutual information

  • Multivariate information
    • A way to measure information between more than two variables
    • PID and Interaction information are two ways to assess this

  • PID: Partial information decomposition is computed using
    • Redundant information
    • Synergistic information
    • Unique information

12 of 34

RECAP: Mutual Information

  • I(X;Y) is the mutual information between two variables
    • How much information do we have for one variable about the other?
  • Mutual information is defined as

  • Measures the difference between the two distributions: joint and product of marginals

13 of 34

RECAP: Conditional mutual information

  • I(X;Y | Z) is defined as the mutual information between X and Y given a third variable Z

  • Conditional mutual information is defined as

14 of 34

Interaction information

  • Let X, Y, Z denote three random variables
  • Interaction Information (II) is defined as the additional information we get when we observe a third variable versus when we don’t
  • Defined as the difference between conditional mutual information and mutual information

Conditional mutual information

Mutual information

15 of 34

Partial Information Decomposition

  • A way to measure information between three variables, where two are “source” variables and the third is a target
  • Let S={X, Y} denote the source set and Z the target
  • PID is defined using the unique, synergistic and redundant information between Z and X,Y

16 of 34

Partial Information Decomposition

  • PID is defined as

  • UniqueY(Z;X) is the unique information between source variable X and target variable Z when the other source variable is Y
  • For network inference, only the “Unique” and “Redundancy” information is important

17 of 34

Redundancy

  • Redundancy is defined based on the minimal information about Z’s state from the source variables.
  • First define for each source X

  • Redundancy is

18 of 34

Unique information

  • The mutual information between two variables can be decomposed into two parts

19 of 34

Synergistic information

  • Recall interaction information: II(X;Y;Z)

  • Synergistic information is defined as the information about target Z, that we have due to both X and Y

20 of 34

Unique information can discriminate between edges and non-edges

21 of 34

Using PID for network inference

 

22 of 34

PIDC on simulated data

23 of 34

Application of PIDC to single cell data

24 of 34

Plan for today

  • Brief background into scRNA-seq
  • Algorithms for scRNA-seq
    • PIDC
    • SCENIC
  • Benchmarking scRNA-seq network inference algorithms

25 of 34

SCENIC

  • S. Aibar et al., “SCENIC: single-cell regulatory network inference and clustering,” Nature Methods, vol. advance online publication, Oct. 2017, doi: 10.1038/nmeth.4463.
  • Based on three tools
    • GENIE3, RcisTarget, AUCcell
  • Incorporates TF motif binding information to infer physical interactions

26 of 34

SCENIC workflow

Use GENIE3 to predict target genes of each regulator.

These are likely co-expressed

Use RcisTarget to infer which targets have a binding motif support. Such filtered targets are called regulons

Use AUCcell to assess how active a regulon is in a cell.

27 of 34

Evaluation of SCENIC

  • Can SCENIC predict key regulators of different cell types
  • Does cell clustering based on SCENIC AUCcell score provide agree with known cell types?
  • Assess performance on a mouse brain scRNA-seq data with known cell types

28 of 34

Application of SCENIC on a dataset with known cell labels

SCENIC accurately predicts regulators for known cell types. Clustering of cell types is more accurate when using SCENIC regulons

29 of 34

Plan for today

  • Brief background into scRNA-seq
  • Algorithms for scRNA-seq
    • PIDC
    • SCENIC
  • Benchmarking scRNA-seq network inference algorithms

30 of 34

Benchmarking single-cell network inference algorithms

A. Pratapa, A. P. Jalihal, J. N. Law, A. Bharadwaj, and T. M. Murali, “Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data,” Nat Methods, vol. 17, no. 2, pp. 147–154, Feb. 2020, doi: 10.1038/s41592-019-0690-6.

31 of 34

Performance of algorithms on simulated data

Different network topologies

Number of genes: 7-18

32 of 34

Performance on real data

Each number is the Early Precision Ratio. Higher the better.

33 of 34

Overall findings from benchmarking

PIDC or GENIE3 might be good algorithms for scRNA-seq network inference

34 of 34

Take away points

  • Inference of regulatory networks from scRNA-seq data is in its early stages
    • Although a lot of algorithms exist, there is room for improvement
  • Algorithms differ based on
    • Pseudotime incorporation
    • Model of interaction
  • Information theoretic methods might be a promising direction to explore for scRNA-seq network inference