JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 70

Dependency networks

Sep 22^nd 2022

BMI/CS 775 Computational Network Biology�Fall 2022

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

2 of 70

Plan for this section

Representing molecular networks as probabilistic models (Sep 13^th)
Learning directed graphical models: Bayesian networks and variations (Sep 13^th,15^th,20^th)
Dependency networks (Sep 22^nd)
Undirected models and variations (Sep 27^th)
Recent advances in graph learning (Sep 29^th)

single cell RNA-seq network inference
Learning directed models of causality

3 of 70

Readings

Huynh-Thu, Vân A., Alexandre Irrthum, Louis Wehenkel, and Pierre Geurts. 2010. “Inferring Regulatory Networks from Expression Data Using Tree-Based Methods.” PLOS ONE 5 (9): e12776+. https://doi.org/10.1371/journal.pone.0012776.
Greenfield, Alex, Christoph Hafemeister, and Richard Bonneau. 2013. “Robust Data-Driven Incorporation of Prior Knowledge into the Inference of Dynamic Regulatory Networks.” Bioinformatics (Oxford, England) 29 (8): 1060–1067. https://doi.org/10.1093/bioinformatics/btt099.

4 of 70

Let’s start out a bit abstractly

	E1	E2	E3	E4	E5	E6	E7
G1	8.82	9.09	6.43	8.11	7.13	7.55	9.18
G2	0.0	0.15	0.07	0.0	0.08	0.0	0.0
G3	0.0	0.0	0.0	0.0	0.1	0.08	0.26
G4	2.83	1.92	2.33	0.86	2.17	2.53	3.19
G5	2.0	1.32	1.13	0.72	1.25	1.17	2.27
G6	12.41	10.72	10.23	8.59	8.08	8.92	11.61
G7	0.02	0.68	0.0	0.0	0.0	0.0	0.0
G8	0.69	0.95	1.09	0.97	0.71	0.44	0.76
G9	0.11	0.22	0.12	0.21	0.24	0.2	0.43

…

How can we infer what the graph might be?

What assumptions do we need to make?

What will such a graph capture?

Nodes

Node measurements

Suppose we had a large matrix with nodes of an unobserved graph as rows

and the measurements of node activity as columns.

5 of 70

Goals for today

Dependency networks
GENIE3: GEne Network Inference with Ensemble of trees
Dependency networks with priors
Evaluation of expression-based network inference methods

6 of 70

Recall the different types of probabilistic graphs

In each graph type we can assert different conditional independencies
Correlation networks
Markov networks

Gaussian Graphical models

Dependency networks
Bayesian networks

7 of 70

Dependency network

A type of probabilistic graphical model
Approximate Markov networks

Are much easier to learn from data

As in Bayesian networks has

Graph structure
Parameters

Unlike Bayesian network

Can have cyclic dependencies
Computing a joint probability is harder

It is approximated with a “pseudo” likelihood.

Dependency Networks for Inference, Collaborative Filtering and Data visualization

Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

8 of 70

Original motivation of dependency networks

Introduced by Heckerman, Chickering, Meek, et al 2000
Often, Bayesian networks can get confusing

Learned Bayesian networks represent correlation or predictive relationships
But the directionality of the edges are mistakenly interpreted as causal connections

(Consistent) Dependency networks were introduced to distinguish between these cases

9 of 70

Dependency network vs Bayesian network

Dependency Networks for Inference, Collaborative Filtering and Data visualization

Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

Bayesian network

Dependency network

Often times, the Bayesian network on the left is read as if “Age” determines “Income”. However, all this model is capturing is that “Age” is predictive of “Income”.

10 of 70

Markov blanket

Let X be the set of all random variables
Let B_j denote the Markov Blanket of a variable X_j.
X_-j denotes all variables in X other than X_j
Markov blanket of X_j,B_j,is the subset of X that makes X_j independent of all other variables

B_j captures all the information about X_j

11 of 70

Learning dependency networks

…

B_j

Given B_j, X_j is independent of all other variables, X_-j

B_j can be estimated by finding the set of variables that best predict X_j
This requires us to specify the form of P(X_j|B_j)

f_j=P(X_j|B_j)

Entails estimating the Markov blanket of each random variable

X_j

12 of 70

Different representations of f_j=P(X_j|B_j)

If X_j is continuous

f_j can be a linear function
f_jcan be a regression tree
f_j can be an ensemble of trees

E.g. random forests

If X_j is discrete

f_j can be a conditional probability table
f_j can be a conditional probability tree

13 of 70

Popular dependency networks implementations

Learned by solving a set of linear regression problems

TIGRESS (Haury et al, 2010)

Uses a constraint to learn a “sparse” Markov blanket
Uses “stability selection” to estimate confidence of edges

Inferelator (Bonneau et al, Genome Biology 2005)

Can handle time course and single time point data
Handles linear and non-linear regression

Learned by solving a set of non-linear regression problems

Non-linearity captured by Regression Tree (Heckerman et al, 2000)
GENIE3: Non-linearity captured by Random forest (Huynh-Thu et al, 2010)

14 of 70

Goals for today

Dependency networks
GENIE3: GEne Network Inference with Ensemble of trees
Dependency networks with priors
Evaluation of expression-based network inference methods

15 of 70

Expression data matrix

p Genes

	E1	E2	E3	E4	E5	E6	E7
ENSMUSG00000028180	8.82	9.09	6.43	8.11	7.13	7.55	9.18
ENSMUSG00000053211	0.0	0.15	0.07	0.0	0.08	0.0	0.0
ENSMUSG00000028182	0.0	0.0	0.0	0.0	0.1	0.08	0.26
ENSMUSG00000002017	2.83	1.92	2.33	0.86	2.17	2.53	3.19
ENSMUSG00000028184	2.0	1.32	1.13	0.72	1.25	1.17	2.27
ENSMUSG00000028187	12.41	10.72	10.23	8.59	8.08	8.92	11.61
ENSMUSG00000028186	0.02	0.68	0.0	0.0	0.0	0.0	0.0
ENSMUSG00000028189	0.69	0.95	1.09	0.97	0.71	0.44	0.76
ENSMUSG00000028188	0.11	0.22	0.12	0.21	0.24	0.2	0.43

…

N Experiments/Time points

Observations (expression levels) of all variables in sample i, x⁽ⁱ⁾

Observations of variable X_jin all N experiments

16 of 70

GENIE3: GEne Network Inference with Ensemble of trees

Solves a set of regression problems

One per random variable
Minimizes the prediction error per variable X_j

Uses an Ensemble of regression trees to represent f_j

Models non-linear dependencies

Outputs a directed cyclic graph with a confidence of each edge

Directionality means “good predictor”

Focus on a ranking over edges rather than graph structure and parameters

Rank is determined by confidence

Inferring Regulatory Networks from Expression Data Using Tree-Based Methods Van Anh Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, Pierre Geurts, Plos One 2010

17 of 70

A very simple regression tree for two variables

X₂

X₃

e₁

e₂

X₂> e₁

X₂> e₂

YES

X₃

The tree specifies X₃as a function of X₂

Interior nodes

Leaf nodes

18 of 70

Prediction of X_j using a single tree

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

19 of 70

Prediction of X_j using a single tree

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

20 of 70

Prediction of X_j using a single tree

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

21 of 70

Prediction of X_j using a single tree

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

22 of 70

Prediction of X_j using a single tree

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Prediction is the mean at each leaf node

23 of 70

Algorithm for learning a regression tree

Input: dataset D, variable X_j, candidate predictors X_-_j of X_j
Output: Tree T
Initialize T to a leaf node, estimated from all samples of X_j.Assign all samples to leaf node

24 of 70

Algorithm for learning a regression tree

25 of 70

Quantifying a split on the tree

The set of samples in the left node

The set of samples in the right node

S_left and S_right are sets of samples obtained by testing X_i for a particular split s

26 of 70

One iteration of regression tree learning

Let X={X₁,X₂,X₃,X₄}
Assume we are searching for the neighbors of X₃ and it already has two neighbors X₁ and X₂

X₁> e₁

X₂> e₂

YES

N₁

N₂

N₃

N_l:Gaussian associated with leaf l

Iteration i

X_1,X_2,X₄will all be considered as candidate splits using the examples at each current leaf node

27 of 70

One iteration of regression tree learning

If we split on X₄, then we will have a new neighbor.

X₁> e₁

X₂> e₂

YES

N₁

N₂

N₃

N_l:Gaussian associated with leaf l

Iteration i

X₁> e₁

X₂> e₂

YES

X₄> e₃

YES

N₂

N₃

N₄

N₅

Iteration i+1

28 of 70

Convergence criteria

Minimum number of examples at a leaf node
Depth of a tree
Error tolerance

29 of 70

An Ensemble of trees

A single tree is prone to “overfitting”
Instead of learning a single tree, ensemble models make use of a collection of trees

30 of 70

Prediction using an Ensemble of Trees

……

tree₁

tree_T

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

31 of 70

Prediction using an Ensemble of Trees

……

tree₁

tree_T

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

32 of 70

Prediction using an Ensemble of Trees

……

tree₁

tree_T

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

33 of 70

Prediction using an Ensemble of Trees

……

tree₁

tree_T

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

34 of 70

Prediction using an Ensemble of Trees

Prediction is

……

tree₁

tree_T

split nodes

leaf nodes

x_-j

Taken from ICCV09 tutorial by Kim, Shotton and Stenger: http://www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

35 of 70

Expression data matrix for GENIE3

p Genes

N Experiments/Time points etc

Observations (expression levels) of all variables in sample i

Observations of X_jin all m experiments

GENIE3 will use the transpose of this data matrix

N Experiments

p Genes

Observations (expression levels) of all variables in sample i

36 of 70

GENIE3 algorithm sketch

Figure from Huynh-Thu et al., 2010

Predictor ranking

37 of 70

GENIE3 algorithm sketch

For each X_j, generate learning samples of input/output pairs

LS_j={(x_-j^k,x_j^k), k=1..N}
On each LS_j learn f_jto predict the value of X_j
f_jis an ensemble of regression trees
Estimate importance w_ij for all genes i ≠ j

w_ij quantifies the confidence of the edge between X_i and X_j
Associated with the decrease in variance of X_j when X_i is included in f_j

Generate a global ranking of edges based on each w_ij

38 of 70

Learning f_jin GENIE3

Uses two types of Ensembles to represent the f_j:

Random forest or Extra Trees

Learning the Random forest

Generate M=1000 bootstrap samples and learn 1000 trees, one from each sample
At each node to be split, search for best split among K randomly selected variables
K was set to p-1 or (p-1)^1/2,where p is the number of regulators/parents

Learning the Extra-Trees

Learn 1000 trees
Each tree is built from the original learning sample
At each node, the best split is determined among K random splits, each split determined by randomly selecting one input (without replacement) and a threshold

39 of 70

Computing the importance weight of a predictor

Importance is computed at each interior node

Remember each predictor can show up multiple times as interior nodes

For an interior node, importance is given by the reduction in variance when splitting on that node

Interior node

Set of data samples that reach this node

#S: Size of the set S

Var(S): variance of the output variable x_j in set S

S_t: subset of S when a test at N is true

S_f: subset of S when a test at N is false

40 of 70

Computing the importance weight of a predictor

For a single tree the overall importance is a sum over all points in the tree where this node is used to split
For an ensemble the importance is averaged over all trees
To avoid bias towards highly variable genes, normalize the expression genes to all have unit variance

41 of 70

Goals for today

Dependency networks
GENIE3: GEne Network Inference with Ensemble of trees
Dependency networks with priors
Evaluation of expression-based network inference methods

42 of 70

Classes of methods for incorporating priors

Parameter prior based approaches

Inferelator (Greenfield et al., Bioinformatics 2013)
Lirnet (Lee et al., Plos computational biology 2009)

Structure prior based approaches

Dynamic Bayesian network (Hill et al., Bioinformatics, 2012, Werhli et al., 2007)
Physical module networks (Novershtern et al., Bioinformatics 2011)
MERLIN-P (Siahpirani et al.,2016)

43 of 70

Overview of the Inferelator algorithm

Based on linear regression models
Handles time series and steady state data
Prior is incorporated at the edge weight using two strategies

Modified Elastic Net
Bayesian Best Subset Regression

Greenfield et al. 2013, Bonneau et al. 2007

44 of 70

Modeling the relationship between regulator and target in Inferelator

Time series

Steady state

Network inference: Estimate coefficients

Number of genes

Number of samples

45 of 70

Two approaches to integrate prior graph structure

Modified Elastic Net (MEN)

Regularized regression-based framework

Bayesian Best Subset Regression (BBSR)

Probabilistic framework with exhaustive search on regulator sets

46 of 70

Regularized regression

The regularized regression framework can be generally described as follows:

Depending upon f we may have different types of regularized regression frameworks

Regularization term

47 of 70

Regularized regression

takes the form of some norm of
L1 norm

L2 norm

Elastic net (Zou & Hastie 2005)

48 of 70

Elastic net regression

Elastic net regression objective for the i^th gene

Which can be equivalently written as

Minimize

Subject to

Estimate via cross validation

L1 norm

L2 norm

49 of 70

Modified Elastic Net (MEN)

The modification to Elastic net

Set this <1 so that if there is a prior edge between x_p->y_i, the regression coefficient will be penalized less

50 of 70

Inferelator workflow

51 of 70

GENIE3 vs Inferelator

GENIE3

Non-linear regression model
Ensemble of trees: Random Forests/Extra Trees

Inferelator

Linear regression model
Ensemble of linear regression models
Incorporates prior information

52 of 70

Goals for today

Dependency networks
GENIE3: GEne Network Inference with Ensemble of trees
Dependency networks with priors
Evaluation of expression-based network inference methods

53 of 70

Evaluating the network

Assessing confidence
Area under the precision recall curve
F-score
Do gene modules participate in coherent function?
Can the network predict expression in a new condition?

54 of 70

Assessing confidence in the learned network

Typically the number of training samples is not sufficient to reliably determine the “right” network
One can however estimate the confidence of specific features of the network

Graph features f(G)

Examples of f(G)

An edge between two random variables
Order relations: Is X, Y’s ancestor?

55 of 70

How to assess confidence in graph features?

What we want is P(f(G)|D), which is

But it is not feasible to compute this sum

Instead we will use a “bootstrap” procedure

56 of 70

Bootstrap to assess graph feature confidence

For i=1 to m

Construct dataset D_i by sampling with replacement N samples from dataset D, where N is the size of the original D
Learn a graphical model {G_i,Θ_i}

For each feature of interest f, calculate confidence

57 of 70

Bootstrap/stability selection

Expression data

Final inferred�network

Output

Input

58 of 70

Bootstrap/stability selection

Expression data

Final inferred�network

Output

Input

59 of 70

Does the bootstrap confidence represent real relationships?

Compare the confidence distribution to that obtained from randomized data
Shuffle the columns of each row (gene) separately
Repeat the bootstrap procedure

randomize each

row independently

genes

Experimental conditions

Slide credit Prof. Mark Craven

60 of 70

Bootstrap-based confidence differs between real and random data

Random

Real

Friedman et al 2000

61 of 70

Example of a high confidence sub-network

One learned Bayesian network

Bootstrapped confidence Bayesian network: highlights a subnetwork associated with yeast mating pathway. Colors indicate genes with known functions.

Nir Friedman, Science 2004

62 of 70

Area under the precision recall curve (AUPR)

Assume we know what the “right” network is
One can use Precision-Recall curves to evaluate the predicted network
Area under the PR curve (AUPR) curve quantifies performance

Precision=

# of correct edges

# of predicted edges

Recall=

# of correct edges

# of true edges

63 of 70

F-score

Assume we know what the “right” network is
Suppose your predicted network has a N edges.

Selected based on sorting edges by confidence/strength of interaction and getting the top X edges

Precision (P) =

# of correct edges

# of predicted edges

Recall (R)=

# of correct edges

# of true edges

F-score=

2*P*R

P+ R

64 of 70

Experimental datasets to assess network structure for gene regulatory networks

Sequence specific motifs

ChIP-chip and ChIP-seq

Factor knockout followed by whole-transcriptome profiling

Siahpirani et al., 2016

65 of 70

AUPR based performance comparison

Results are for E. coli transcriptional regulatory network

66 of 70

DREAM: Dialogue for reverse engineeting assessments and methods

Community effort to assess regulatory network inference

DREAM 5 challenge

Previous challenges: 2006, 2007, 2008, 2009, 2010

Marbach et al. 2012, Nature Methods

67 of 70

Where do different methods rank?

Marbach et al., 2012

Community

Random

These approaches were mostly per-gene

68 of 70

What happens when one adds noisy priors?

Low and high in BBSR and MEN means less dense or more dense

High noise regime

69 of 70

Do priors help?

Siahpirani & Roy, Nucleic Acids Res, Volume 45, Issue 4, 28 February 2017, Page e21, https://doi.org/10.1093/nar/gkw963

Evaluation on the yeast regulatory network

70 of 70

Take away points

Dependency networks learn graph structures by solving a set of regression problems

Easier to learn than Bayesian network, but harder to reason with joint probabilities
In practice perform better than Bayesian networks

Common Dependency network inference algorithms

GENIE3
Inferelator

Dependency networks + priors

Inferelator: Modified Elastic Net
MERLIN-P
Addition of priors was beneficial in yeast