Dependency networks
Sep 22nd 2022
BMI/CS 775 Computational Network Biology�Fall 2022
Sushmita Roy
Plan for this section
2
Readings
Let’s start out a bit abstractly
4
| E1 | E2 | E3 | E4 | E5 | E6 | E7 |
G1 | 8.82 | 9.09 | 6.43 | 8.11 | 7.13 | 7.55 | 9.18 |
G2 | 0.0 | 0.15 | 0.07 | 0.0 | 0.08 | 0.0 | 0.0 |
G3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.08 | 0.26 |
G4 | 2.83 | 1.92 | 2.33 | 0.86 | 2.17 | 2.53 | 3.19 |
G5 | 2.0 | 1.32 | 1.13 | 0.72 | 1.25 | 1.17 | 2.27 |
G6 | 12.41 | 10.72 | 10.23 | 8.59 | 8.08 | 8.92 | 11.61 |
G7 | 0.02 | 0.68 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
G8 | 0.69 | 0.95 | 1.09 | 0.97 | 0.71 | 0.44 | 0.76 |
G9 | 0.11 | 0.22 | 0.12 | 0.21 | 0.24 | 0.2 | 0.43 |
…
…
Nodes
Node measurements
Suppose we had a large matrix with nodes of an unobserved graph as rows
and the measurements of node activity as columns.
Goals for today
Recall the different types of probabilistic graphs
Dependency network
Dependency Networks for Inference, Collaborative Filtering and Data visualization
Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000
Original motivation of dependency networks
Dependency network vs Bayesian network
Dependency Networks for Inference, Collaborative Filtering and Data visualization
Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000
Bayesian network
Dependency network
Often times, the Bayesian network on the left is read as if “Age” determines “Income”. However, all this model is capturing is that “Age” is predictive of “Income”.
Markov blanket
Learning dependency networks
?
?
?
…
Bj
fj=P(Xj|Bj)
Entails estimating the Markov blanket of each random variable
Xj
Different representations of fj=P(Xj|Bj)
Popular dependency networks implementations
Goals for today
Expression data matrix
p Genes
| E1 | E2 | E3 | E4 | E5 | E6 | E7 |
ENSMUSG00000028180 | 8.82 | 9.09 | 6.43 | 8.11 | 7.13 | 7.55 | 9.18 |
ENSMUSG00000053211 | 0.0 | 0.15 | 0.07 | 0.0 | 0.08 | 0.0 | 0.0 |
ENSMUSG00000028182 | 0.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.08 | 0.26 |
ENSMUSG00000002017 | 2.83 | 1.92 | 2.33 | 0.86 | 2.17 | 2.53 | 3.19 |
ENSMUSG00000028184 | 2.0 | 1.32 | 1.13 | 0.72 | 1.25 | 1.17 | 2.27 |
ENSMUSG00000028187 | 12.41 | 10.72 | 10.23 | 8.59 | 8.08 | 8.92 | 11.61 |
ENSMUSG00000028186 | 0.02 | 0.68 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
ENSMUSG00000028189 | 0.69 | 0.95 | 1.09 | 0.97 | 0.71 | 0.44 | 0.76 |
ENSMUSG00000028188 | 0.11 | 0.22 | 0.12 | 0.21 | 0.24 | 0.2 | 0.43 |
…
…
N Experiments/Time points
Observations (expression levels) of all variables in sample i, x(i)
Observations of variable Xj in all N experiments
GENIE3: GEne Network Inference with Ensemble of trees
Inferring Regulatory Networks from Expression Data Using Tree-Based Methods Van Anh Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, Pierre Geurts, Plos One 2010
A very simple regression tree for two variables
X2
X3
e1
e2
X2 > e1
X2 > e2
YES
NO
NO
YES
X3
The tree specifies X3 as a function of X2
Interior nodes
Leaf nodes
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Algorithm for learning a regression tree
Algorithm for learning a regression tree
Quantifying a split on the tree
The set of samples in the left node
The set of samples in the right node
Sleft and Sright are sets of samples obtained by testing Xi for a particular split s
One iteration of regression tree learning
X1 > e1
X2 > e2
YES
NO
NO
YES
N1
N2
N3
Nl: Gaussian associated with leaf l
Iteration i
X1, X2, X4 will all be considered as candidate splits using the examples at each current leaf node
One iteration of regression tree learning
NO
X1 > e1
X2 > e2
YES
NO
NO
YES
N1
N2
N3
Nl: Gaussian associated with leaf l
Iteration i
X1 > e1
X2 > e2
YES
NO
NO
YES
X4 > e3
YES
N2
N3
N4
N5
Iteration i+1
Convergence criteria
An Ensemble of trees
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Expression data matrix for GENIE3
p Genes
N Experiments/Time points etc
Observations (expression levels) of all variables in sample i
Observations of Xj in all m experiments
GENIE3 will use the transpose of this data matrix
N Experiments
p Genes
Observations (expression levels) of all variables in sample i
GENIE3 algorithm sketch
Figure from Huynh-Thu et al., 2010
Predictor ranking
GENIE3 algorithm sketch
Learning fj in GENIE3
Computing the importance weight of a predictor
Interior node
Set of data samples that reach this node
#S: Size of the set S
Var(S): variance of the output variable xj in set S
St: subset of S when a test at N is true
Sf: subset of S when a test at N is false
Computing the importance weight of a predictor
Goals for today
Classes of methods for incorporating priors
Overview of the Inferelator algorithm
Greenfield et al. 2013, Bonneau et al. 2007
Modeling the relationship between regulator and target in Inferelator
Network inference: Estimate coefficients
Number of genes
Number of samples
Two approaches to integrate prior graph structure
Regularized regression
Depending upon f we may have different types of regularized regression frameworks
Regularization term
Regularized regression
Elastic net regression
Minimize
Subject to
Estimate via cross validation
L1 norm
L2 norm
Modified Elastic Net (MEN)
Set this <1 so that if there is a prior edge between xp->yi, the regression coefficient will be penalized less
Inferelator workflow
GENIE3 vs Inferelator
GENIE3
Inferelator
Goals for today
Evaluating the network
Assessing confidence in the learned network
How to assess confidence in graph features?
Bootstrap to assess graph feature confidence
Bootstrap/stability selection
57
57
Expression data
Final inferred�network
Output
Input
Bootstrap/stability selection
58
58
Expression data
Final inferred�network
Output
Input
Does the bootstrap confidence represent real relationships?
randomize each
row independently
genes
Experimental conditions
Slide credit Prof. Mark Craven
Bootstrap-based confidence differs between real and random data
f
f
Random
Real
Friedman et al 2000
Example of a high confidence sub-network
One learned Bayesian network
Bootstrapped confidence Bayesian network: highlights a subnetwork associated with yeast mating pathway. Colors indicate genes with known functions.
Nir Friedman, Science 2004
Area under the precision recall curve (AUPR)
Precision=
# of correct edges
# of predicted edges
Recall=
# of correct edges
# of true edges
F-score
Precision (P) =
# of correct edges
# of predicted edges
Recall (R)=
# of correct edges
# of true edges
F-score=
2*P*R
P+ R
Experimental datasets to assess network structure for gene regulatory networks
Siahpirani et al., 2016
AUPR based performance comparison
Results are for E. coli transcriptional regulatory network
DREAM: Dialogue for reverse engineeting assessments and methods
Community effort to assess regulatory network inference
DREAM 5 challenge
Previous challenges: 2006, 2007, 2008, 2009, 2010
Marbach et al. 2012, Nature Methods
Where do different methods rank?
Marbach et al., 2012
Community
Random
These approaches were mostly per-gene
What happens when one adds noisy priors?
Low and high in BBSR and MEN means less dense or more dense
High noise regime
Do priors help?
Siahpirani & Roy, Nucleic Acids Res, Volume 45, Issue 4, 28 February 2017, Page e21, https://doi.org/10.1093/nar/gkw963
Evaluation on the yeast regulatory network
Take away points