1 of 70

Dependency networks

Sep 19th 2024

BMI/CS 775 Computational Network Biology�Fall 2023

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

2 of 70

Plan for this section

  • Representing molecular networks as probabilistic models (Sep 10th)
  • Learning Bayesian networks and extensions (Sep 10th,12th, 17th)
    • Learning Dynamic Bayesian Networks
    • Adding priors
  • Learning Dependency networks (Sep 19th)
  • Learning causal graphs (Sep 24th, 26th)

2

3 of 70

Readings

  • Huynh-Thu, Vân A., Alexandre Irrthum, Louis Wehenkel, and Pierre Geurts. 2010. “Inferring Regulatory Networks from Expression Data Using Tree-Based Methods.” PLOS ONE 5 (9): e12776+. https://doi.org/10.1371/journal.pone.0012776.

4 of 70

Let’s start out a bit abstractly

4

E1

E2

E3

E4

E5

E6

E7

G1

8.82

9.09

6.43

8.11

7.13

7.55

9.18

G2

0.0

0.15

0.07

0.0

0.08

0.0

0.0

G3

0.0

0.0

0.0

0.0

0.1

0.08

0.26

G4

2.83

1.92

2.33

0.86

2.17

2.53

3.19

G5

2.0

1.32

1.13

0.72

1.25

1.17

2.27

G6

12.41

10.72

10.23

8.59

8.08

8.92

11.61

G7

0.02

0.68

0.0

0.0

0.0

0.0

0.0

G8

0.69

0.95

1.09

0.97

0.71

0.44

0.76

G9

0.11

0.22

0.12

0.21

0.24

0.2

0.43

  • How can we infer what the graph might be?

  • What assumptions do we need to make?

  • What will such a graph capture?

Nodes

Node measurements

Suppose we had a large matrix with nodes of an unobserved graph as rows

and the measurements of node activity as columns.

5 of 70

Goals for today

  • Dependency networks
  • GENIE3: GEne Network Inference with Ensemble of trees
  • Evaluation of network inference methods
  • Extensions to Dependency networks

6 of 70

Recall the different types of probabilistic graphical models

  • In each graph type we can assert different conditional independencies
  • Correlation networks
  • Markov networks
    • Gaussian Graphical models
  • Dependency networks
  • Bayesian networks

7 of 70

Dependency network

  • A type of probabilistic graphical model
  • Approximate Markov networks
    • Are much easier to learn from data
  • As in Bayesian networks has
    • Graph structure
    • Parameters
  • Unlike Bayesian network
    • Can have cyclic dependencies
    • Computing a joint probability is harder
      • It is approximated with a “pseudo” likelihood.

Dependency Networks for Inference, Collaborative Filtering and Data visualization

Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

8 of 70

Original motivation of dependency networks

  • Introduced by Heckerman, Chickering, Meek, et al 2000
  • Often, Bayesian networks can get confusing
    • Learned Bayesian networks represent correlation or predictive relationships
    • But the directionality of the edges are mistakenly interpreted as causal connections
  • (Consistent) Dependency networks were introduced to distinguish between these cases

9 of 70

Dependency network vs Bayesian network

Dependency Networks for Inference, Collaborative Filtering and Data visualization

Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

Bayesian network

Dependency network

Often times, the Bayesian network on the left is read as if “Age” determines “Income”. However, all this model is capturing is that “Age” is predictive of “Income”.

10 of 70

Markov blanket

  • Let X be the set of all random variables
  • Let Bj denote the Markov Blanket of a variable Xj.
  • X-j denotes all variables in X other than Xj
  • Markov blanket of Xj ,Bj,is the subset of X that makes Xj independent of all other variables

  • Bj captures all the information about Xj

11 of 70

Learning dependency networks

?

?

?

Bj

  • Given Bj, Xj is independent of all other variables, X-j

  • Bj can be estimated by finding the set of variables that best predict Xj
  • This requires us to specify the form of P(Xj|Bj)

fj=P(Xj|Bj)

Entails estimating the Markov blanket of each random variable

Xj

12 of 70

Different representations of fj=P(Xj|Bj)

  • If Xj is continuous
    • fj can be a linear function
    • fj can be a regression tree
    • fj can be an ensemble of trees
      • E.g. random forests

  • If Xj is discrete
    • fj can be a conditional probability table
    • fj can be a conditional probability tree

13 of 70

Popular dependency networks implementations

  • Learned by solving a set of linear regression problems
    • TIGRESS (Haury et al, 2010)
      • Uses a constraint to learn a “sparse” Markov blanket
      • Uses “stability selection” to estimate confidence of edges
    • Inferelator (Bonneau et al, Genome Biology 2005)
      • Can handle time course and single time point data
      • Handles linear and non-linear regression
  • Learned by solving a set of non-linear regression problems
    • Non-linearity captured by Regression Tree (Heckerman et al, 2000)
    • GENIE3: Non-linearity captured by Random forest (Huynh-Thu et al, 2010)

14 of 70

Goals for today

  • Dependency networks
  • GENIE3: GEne Network Inference with Ensemble of trees
  • Evaluation of network inference methods
  • Extensions to Dependency networks

15 of 70

Expression data matrix

p Genes

E1

E2

E3

E4

E5

E6

E7

ENSMUSG00000028180

8.82

9.09

6.43

8.11

7.13

7.55

9.18

ENSMUSG00000053211

0.0

0.15

0.07

0.0

0.08

0.0

0.0

ENSMUSG00000028182

0.0

0.0

0.0

0.0

0.1

0.08

0.26

ENSMUSG00000002017

2.83

1.92

2.33

0.86

2.17

2.53

3.19

ENSMUSG00000028184

2.0

1.32

1.13

0.72

1.25

1.17

2.27

ENSMUSG00000028187

12.41

10.72

10.23

8.59

8.08

8.92

11.61

ENSMUSG00000028186

0.02

0.68

0.0

0.0

0.0

0.0

0.0

ENSMUSG00000028189

0.69

0.95

1.09

0.97

0.71

0.44

0.76

ENSMUSG00000028188

0.11

0.22

0.12

0.21

0.24

0.2

0.43

N Experiments/Time points

Observations (expression levels) of all variables in sample i, x(i)

Observations of variable Xj in all N experiments

 

16 of 70

GENIE3: GEne Network Inference with Ensemble of trees

  • Solves a set of regression problems
    • One per random variable
    • Minimizes the prediction error per variable Xj

  • Uses an Ensemble of regression trees to represent fj
    • Models non-linear dependencies
  • Outputs a directed cyclic graph with a confidence of each edge
    • Directionality means “good predictor”
  • Focus on a ranking over edges rather than graph structure and parameters
    • Rank is determined by confidence

Inferring Regulatory Networks from Expression Data Using Tree-Based Methods Van Anh Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, Pierre Geurts, Plos One 2010

17 of 70

A very simple regression tree for two variables

X2

X3

e1

e2

X2 > e1

X2 > e2

YES

NO

NO

YES

X3

The tree specifies X3 as a function of X2

Interior nodes

Leaf nodes

18 of 70

Prediction of Xj using a single tree

split nodes

leaf nodes

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

19 of 70

Prediction of Xj using a single tree

split nodes

leaf nodes

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

20 of 70

Prediction of Xj using a single tree

split nodes

leaf nodes

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

21 of 70

Prediction of Xj using a single tree

split nodes

leaf nodes

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

22 of 70

Prediction of Xj using a single tree

split nodes

leaf nodes

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

23 of 70

Algorithm for learning a regression tree

  • Input: dataset D, variable Xj, candidate predictors X-j of Xj
  • Output: Tree T
  • Initialize T to a leaf node, estimated from all samples of Xj. Assign all samples to leaf node

24 of 70

Algorithm for learning a regression tree

  •  

25 of 70

Quantifying a split on the tree

  •  

The set of samples in the left node

The set of samples in the right node

Sleft and Sright are sets of samples obtained by testing Xi for a particular split s

26 of 70

One iteration of regression tree learning

  • Let X={X1,X2,X3,X4}
  • Assume we are searching for the neighbors of X3 and it already has two neighbors X1 and X2

X1 > e1

X2 > e2

YES

NO

NO

YES

N1

N2

N3

Nl: Gaussian associated with leaf l

Iteration i

X1, X2, X4 will all be considered as candidate splits using the examples at each current leaf node

27 of 70

One iteration of regression tree learning

  • If we split on X4, then we will have a new neighbor.

NO

X1 > e1

X2 > e2

YES

NO

NO

YES

N1

N2

N3

Nl: Gaussian associated with leaf l

Iteration i

X1 > e1

X2 > e2

YES

NO

NO

YES

X4 > e3

YES

N2

N3

N4

N5

Iteration i+1

28 of 70

Convergence criteria

  • Minimum number of examples at a leaf node
  • Depth of a tree
  • Error tolerance

29 of 70

An Ensemble of trees

  • A single tree is prone to “overfitting”
  • Instead of learning a single tree, ensemble models make use of a collection of trees

30 of 70

Prediction using an Ensemble of Trees

……

tree1

treeT

split nodes

leaf nodes

x-j

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

31 of 70

Prediction using an Ensemble of Trees

……

tree1

treeT

split nodes

leaf nodes

x-j

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

32 of 70

Prediction using an Ensemble of Trees

……

tree1

treeT

split nodes

leaf nodes

x-j

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

33 of 70

Prediction using an Ensemble of Trees

……

tree1

treeT

split nodes

leaf nodes

x-j

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

34 of 70

Prediction using an Ensemble of Trees

    • Prediction is

……

tree1

treeT

split nodes

leaf nodes

x-j

x-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

35 of 70

Expression data matrix for GENIE3

p Genes

N Experiments/Time points etc

Observations (expression levels) of all variables in sample i

Observations of Xj in all m experiments

GENIE3 will use the transpose of this data matrix

N Experiments

p Genes

Observations (expression levels) of all variables in sample i

36 of 70

GENIE3 algorithm sketch

Figure from Huynh-Thu et al., 2010

Predictor ranking

37 of 70

GENIE3 algorithm sketch

  • For each Xj, generate learning samples of input/output pairs
    • LSj={(x-jk,xjk), k=1..N}
    • On each LSj learn fj to predict the value of Xj
    • fj is an ensemble of regression trees
    • Estimate importance wij for all genes i ≠ j
      • wij quantifies the confidence of the edge between Xi and Xj
      • Associated with the decrease in variance of Xj when Xi is included in fj
  • Generate a global ranking of edges based on each wij

38 of 70

Learning fj in GENIE3

  • Uses two types of Ensembles to represent the fj:
    • Random forest or Extra Trees
  • Learning the Random forest
    • Generate M=1000 bootstrap samples and learn 1000 trees, one from each sample
    • At each node to be split, search for best split among K randomly selected variables
    • K was set to p-1 or (p-1)1/2, where p is the number of regulators/parents
  • Learning the Extra-Trees
    • Learn 1000 trees
    • Each tree is built from the original learning sample
    • At each node, the best split is determined among K random splits, each split determined by randomly selecting one input (without replacement) and a threshold

39 of 70

Computing the importance weight of a predictor

  • Importance is computed at each interior node
    • Remember each predictor can show up multiple times as interior nodes
  • For an interior node, importance is given by the reduction in variance when splitting on that node

Interior node

Set of data samples that reach this node

#S: Size of the set S

Var(S): variance of the output variable xj in set S

St: subset of S when a test at N is true

Sf: subset of S when a test at N is false

40 of 70

Computing the importance weight of a predictor

  • For a single tree the overall importance is a sum over all points in the tree where this node is used to split
  • For an ensemble the importance is averaged over all trees
  • To avoid bias towards highly variable genes, normalize the expression genes to all have unit variance

41 of 70

Goals for today

  • Dependency networks
  • GENIE3: GEne Network Inference with Ensemble of trees
  • Evaluation of network inference methods
  • Extensions to Dependency networks

42 of 70

Evaluating the network

  • Assessing confidence
  • Area under the precision recall curve
  • F-score
  • Do gene targets participate in coherent function?
  • Can the network predict expression in a new condition?

43 of 70

Assessing confidence in the learned network

  • Typically the number of training samples is not sufficient to reliably determine the “right” network
  • One can however estimate the confidence of specific features of the network
    • Graph features f(G)
  • Examples of f(G)
    • An edge between two random variables
    • Order relations: Is X, Y’s ancestor?

44 of 70

How to assess confidence in graph features?

  • What we want is P(f(G)|D), which is

  • But it is not feasible to compute this sum

  • Instead we will use a “bootstrap” procedure

45 of 70

Bootstrap to assess graph feature confidence

  • For i=1 to m
    • Construct dataset Di by sampling with replacement N samples from dataset D, where N is the size of the original D
    • Learn a graphical model {Gi,Θi}
  • For each feature of interest f, calculate confidence

46 of 70

Bootstrap/stability selection

46

46

Expression data

Final inferred�network

Output

Input

47 of 70

Bootstrap/stability selection

47

47

Expression data

Final inferred�network

Output

Input

48 of 70

Does the bootstrap confidence represent real relationships?

  • Compare the confidence distribution to that obtained from randomized data
  • Shuffle the columns of each row (gene) separately
  • Repeat the bootstrap procedure

randomize each

row independently

genes

Experimental conditions

Slide credit Prof. Mark Craven

49 of 70

Bootstrap-based confidence differs between real and random data

f

f

Random

Real

Friedman et al 2000

50 of 70

Example of a high confidence sub-network

One learned Bayesian network

Bootstrapped confidence Bayesian network: highlights a subnetwork associated with yeast mating pathway. Colors indicate genes with known functions.

Nir Friedman, Science 2004

51 of 70

Area under the precision recall curve (AUPR)

  • Assume we know what the “right” network is
  • One can use Precision-Recall curves to evaluate the predicted network
  • Area under the PR curve (AUPR) curve quantifies performance

Precision=

# of correct edges

# of predicted edges

Recall=

# of correct edges

# of true edges

52 of 70

F-score

  • Assume we know what the “right” network is
  • Suppose your predicted network has a N edges.
    • Selected based on sorting edges by confidence/strength of interaction and getting the top X edges

Precision (P) =

# of correct edges

# of predicted edges

Recall (R)=

# of correct edges

# of true edges

F-score=

2*P*R

P+ R

53 of 70

Experimental datasets to assess network structure for gene regulatory networks

    • Sequence specific motifs

    • ChIP-chip and ChIP-seq

    • Factor knockout followed by whole-transcriptome profiling

Siahpirani et al., 2016

54 of 70

AUPR based performance comparison

Results are for E. coli transcriptional regulatory network

55 of 70

DREAM: Dialogue for reverse engineeting assessments and methods

Community effort to assess regulatory network inference

DREAM 5 challenge

Previous challenges: 2006, 2007, 2008, 2009, 2010

Marbach et al. 2012, Nature Methods

56 of 70

Where do different methods rank?

Marbach et al., 2012

Community

Random

These approaches were mostly per-gene

57 of 70

Goals for today

  • Dependency networks
  • GENIE3: GEne Network Inference with Ensemble of trees
  • Evaluation of network inference methods
  • Extensions to dependency networks

58 of 70

Extensions to Dependency networks

  • Dependency networks have been extended to incorporate other sources of information
  • These methods can be grouped into those
    • That constrain the structure
      • Inferelator: BBSR, MEN (Greenfield et al., Bioinformatics 2013)
      • Lirnet (Lee et al., Plos computational biology 2009)
      • MERLIN-P (Siahpirani et al.,2016)
      • iRafNet (Petralia et al., 2015)
    • That estimate hidden activity levels of regulators
      • mLASSO-StARS (Miraldi et al., 2018)
      • NetRex (Wang et al., 2018)
      • MERLIN-P-TFA (Carriel & Halberg et al., 2022; Siahpirani et al., In preparation)

59 of 70

Extensions to Dependency networks

59

Input

Model

Output

BBSR+TFA, Arrieta‐Ortiz, Hafemeister, et al. Mol Syst Biol. 2015

Additional data

Prior

ATAC-seq

RNA-seq / Microarray

Sequence motifs

Constrain structure

Estimate hidden TF activity

  • MERLIN-P
  • Inferelator
  • iRafNet
  • BBSR+TFA
  • NetREX
  • MERLIN-P-TFA

ChIP-seq

Target

Regulator

60 of 70

Regularized regression

  • Regularized regression is a popular way to incorporate prior/constraints in a graph learning problem
  • The regularized regression framework can be generally described as follows:

Depending upon f we may have different types of regularized regression frameworks

Regularization term

61 of 70

Regularized regression

  • takes the form of some norm of
  • L1 norm

  • L2 norm

  • Elastic net (Zou & Hastie 2005)

62 of 70

Elastic net regression in Inferelator

  • Elastic net regression objective for the ith gene

 

Greenfield, Alex, Christoph Hafemeister, and Richard Bonneau. 2013. “Robust Data-Driven Incorporation of Prior Knowledge into the Inference of Dynamic Regulatory Networks.” Bioinformatics (Oxford, England) 29 (8): 1060–1067. https://doi.org/10.1093/bioinformatics/btt099.

63 of 70

Elastic net regression in Inferelator

  • Which can be equivalently written as

Minimize

Subject to

Estimate via cross validation

L1 norm

L2 norm

64 of 70

Modified Elastic Net (MEN) in Inferelator

  • The modification to Elastic net

Set this <1 so that if there is a prior edge between xp->yi, the regression coefficient will be penalized less

65 of 70

Do priors help?

Siahpirani & Roy, Nucleic Acids Res, Volume 45, Issue 4, 28 February 2017, Page e21, https://doi.org/10.1093/nar/gkw963

Evaluation on the yeast regulatory network

66 of 70

Estimating the activity of a transcription factor

66

Can we relax this assumption?

Transcription Factor Activity (TFA) estimation

67 of 70

Network component analysis (NCA) for TFA estimation

67

Liao et al. PNAS 2003�Arrieta‐Ortiz, Hafemeister, et al. Mol Syst Biol. 2015

68 of 70

TFA estimation and network inference

68

Expression

Input network

NCA

Estimated�TFA

Network �Inference

Inferred network

69 of 70

Key properties of dependency network algorithms with priors

Algorithm

General model

Prior incorporated how?

Prior used for?

BBSR/MEN

Linear regression

Regularized regression for prior on parameters

Constraining the structure

MERLIN+P

Linear regression

Prior on graph structure

Constraining the structure

iRafNet

Tree-based regression

Biased sampling

Constraining the structure

mLASSO-StARS

Linear regression

Regularized regression on parameters prior on parameters

Constraining the structure and estimating hidden regulator activities

NetRex

Linear regression

Prior on structure

Constraining the structure and estimating hidden regulator activities

70 of 70

Take away points

  • Dependency networks learn graph structures by solving a set of regression problems
    • Easier to learn than Bayesian network, but harder to reason with joint probabilities
    • In practice perform better than Bayesian networks when evaluating structure
  • Algorithms differ based on how they estimate the predictive regulators
  • Common Dependency network inference algorithms using expression alone
    • GENIE3
    • Inferelator
  • Dependency networks + priors
    • Modified Elastic Net, BBSR
    • MERLIN-P
    • iRafNet
    • mLASSO-StARS