Dependency networks
Sep 23rd 2021
BMI 826-23 Computational Network Biology�Fall 2021
Sushmita Roy
Plan for this section
Readings
Goals for today
Recall the different types of probabilistic graphs
Dependency network
Dependency Networks for Inference, Collaborative Filtering and Data visualization
Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000
Original motivation of dependency networks
Dependency network vs Bayesian network
Dependency Networks for Inference, Collaborative Filtering and Data visualization
Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000
Bayesian network
Dependency network
Often times, the Bayesian network on the left is read as if “Age” determines “Income”. However, all this model is capturing is that “Age” is predictive of “Income”.
Learning dependency networks
?
?
?
…
Bj
fj=P(Xj|Bj)
Entails estimating the Markov blanket of each random variable
Xj
Different representations of fj=P(Xj|Bj)
Popular dependency networks implementations
Goals for today
Expression data matrix
p Genes
| E1 | E2 | E3 | E4 | E5 | E6 | E7 |
ENSMUSG00000028180 | 8.82 | 9.09 | 6.43 | 8.11 | 7.13 | 7.55 | 9.18 |
ENSMUSG00000053211 | 0.0 | 0.15 | 0.07 | 0.0 | 0.08 | 0.0 | 0.0 |
ENSMUSG00000028182 | 0.0 | 0.0 | 0.0 | 0.0 | 0.1 | 0.08 | 0.26 |
ENSMUSG00000002017 | 2.83 | 1.92 | 2.33 | 0.86 | 2.17 | 2.53 | 3.19 |
ENSMUSG00000028184 | 2.0 | 1.32 | 1.13 | 0.72 | 1.25 | 1.17 | 2.27 |
ENSMUSG00000028187 | 12.41 | 10.72 | 10.23 | 8.59 | 8.08 | 8.92 | 11.61 |
ENSMUSG00000028186 | 0.02 | 0.68 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
ENSMUSG00000028189 | 0.69 | 0.95 | 1.09 | 0.97 | 0.71 | 0.44 | 0.76 |
ENSMUSG00000028188 | 0.11 | 0.22 | 0.12 | 0.21 | 0.24 | 0.2 | 0.43 |
…
…
N Experiments/Time points
Observations (expression levels) of all variables in sample i, x(i)
Observations of variable Xj in all N experiments
GENIE3: GEne Network Inference with Ensemble of trees
Inferring Regulatory Networks from Expression Data Using Tree-Based Methods Van Anh Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, Pierre Geurts, Plos One 2010
A very simple regression tree for two variables
X2
X3
e1
e2
X2 > e1
X2 > e2
YES
NO
NO
YES
X3
The tree specifies X3 as a function of X2
Interior nodes
Leaf nodes
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Prediction of Xj using a single tree
split nodes
leaf nodes
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction is the mean at each leaf node
Algorithm for learning a regression tree
Quantifying a split on the tree
The set of samples in the left node
The set of samples in the right node
Sleft and Sright are sets of samples obtained by testing Xi for a particular split s
One iteration of regression tree learning
One iteration of regression tree learning
X1 > e1
X2 > e2
YES
NO
NO
YES
N1
N2
N3
Nl: Gaussian associated with leaf l
Iteration i
One iteration of regression tree learning
NO
X1 > e1
X2 > e2
YES
NO
NO
YES
N1
N2
N3
Nl: Gaussian associated with leaf l
Iteration i
X1 > e1
X2 > e2
YES
NO
NO
YES
X4 > e3
YES
N2
N3
N4
N5
Iteration i+1
Convergence criteria
An Ensemble of trees
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Prediction using an Ensemble of Trees
……
tree1
treeT
split nodes
leaf nodes
x-j
x-j
Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial
Expression data matrix for GENIE3
p Genes
N Experiments/Time points etc
Observations (expression levels) of all variables in sample i
Observations of Xj in all m experiments
GENIE3 will use the transpose of this data matrix
N Experiments
p Genes
Observations (expression levels) of all variables in sample i
GENIE3 algorithm sketch
Figure from Huynh-Thu et al., 2010
Predictor ranking
GENIE3 algorithm sketch
Learning fj in GENIE3
Computing the importance weight of a predictor
Interior node
Set of data samples that reach this node
#S: Size of the set S
Var(S): variance of the output variable xj in set S
St: subset of S when a test at N is true
Sf: subset of S when a test at N is false
Computing the importance weight of a predictor
Goals for today
Evaluating the network
Assessing confidence in the learned network
How to assess confidence in graph features?
Bootstrap to assess graph feature confidence
Bootstrap/stability selection
44
44
Expression data
Final inferred�network
Output
Input
Bootstrap/stability selection
45
45
Expression data
Final inferred�network
Output
Input
Does the bootstrap confidence represent real relationships?
randomize each
row independently
genes
Experimental conditions
Slide credit Prof. Mark Craven
Bootstrap-based confidence differs between real and random data
f
f
Random
Real
Friedman et al 2000
Example of a high confidence sub-network
One learned Bayesian network
Bootstrapped confidence Bayesian network: highlights a subnetwork associated with yeast mating pathway. Colors indicate genes with known functions.
Nir Friedman, Science 2004
Area under the precision recall curve (AUPR)
Precision=
# of correct edges
# of predicted edges
Recall=
# of correct edges
# of true edges
F-score
Precision (P) =
# of correct edges
# of predicted edges
Recall (R)=
# of correct edges
# of true edges
F-score=
2*P*R
P+ R
Experimental datasets to assess network structure for gene regulatory networks
Siahpirani et al., 2016
AUPR based performance comparison
Results are for E. coli transcriptional regulatory network
DREAM: Dialogue for reverse engineeting assessments and methods
Community effort to assess regulatory network inference
DREAM 5 challenge
Previous challenges: 2006, 2007, 2008, 2009, 2010
Marbach et al. 2012, Nature Methods
Where do different methods rank?
Marbach et al., 2012
Community
Random
These approaches were mostly per-gene
Some comments about expression-based network inference methods
References