1 of 41

Priors in Dependency network learning

Sep 30^th 2021

BMI 826-23 Computational Network Biology�Fall 2021

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

2 of 41

Goals for this lecture

Incorporating priors in Dependency networks using linear regression
Incorporating priors in Dependency networks using tree models

3 of 41

Readings

Greenfield, Alex, Christoph Hafemeister, and Richard Bonneau. 2013. “Robust Data-Driven Incorporation of Prior Knowledge into the Inference of Dynamic Regulatory Networks.” Bioinformatics (Oxford, England) 29 (8): 1060–1067. https://doi.org/10.1093/bioinformatics/btt099.
(Optional) Petralia F, Wang P, Yang J, Tu Z. Integrative random forest for gene regulatory network inference. Bioinformatics. 2015;31(12):i197-205. doi:10.1093/bioinformatics/btv268

4 of 41

Recall Dependency networks

A type of probabilistic graphical model
Approximate Markov networks

Are much easier to learn from data

As in Bayesian networks have

A graph structure
Parameters capturing dependencies between a variable and its parents

Unlike Bayesian network

Can have cyclic dependencies
Computing a joint probability is harder

It is approximated with a “pseudo” likelihood.

Dependency Networks for Inference, Collaborative Filtering and Data visualization

Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

5 of 41

Recall Learning dependency networks

?

…

B_j

Let B_j denote the Markov Blanket of a variable X_j.
B_j is the set of variables that make X_j independent of all other variables, X_-j

B_j can be estimated by finding the set of variables that best predict X_j
This requires us to specify the form of P(X_j|B_j)

f_j=P(X_j|B_j)

One can think about this problem as estimating the Markov blanket of each random variable

X_j

6 of 41

Classes of methods for incorporating priors

Parameter prior based approaches

Inferelator (Greenfield et al., Bioinformatics 2013)
Lirnet (Lee et al., Plos computational biology 2009)

Structure prior based approaches

Dynamic Bayesian network (Hill et al., Bioinformatics, 2012, Werhli et al., 2007)
Physical module networks (Novershtern et al., Bioinformatics 2011)
MERLIN-P (Siahpirani et al.,2016)

7 of 41

Overview of the Inferelator algorithm

Based on linear regression models
Handles time series and steady state data
Prior is incorporated at the edge weight using two strategies

Modified Elastic Net
Bayesian Best Subset Regression

Greenfield et al. 2013, Bonneau et al. 2007

8 of 41

Notation

9 of 41

Modeling the relationship between regulator and target in Inferelator

Time series

Steady state

m is the time lag

Network inference: Estimate coefficients

Number of genes

Number of samples

10 of 41

Two approaches to integrate prior graph structure

Modified Elastic Net (MEN)

Bayesian Best Subset Regression (BBSR)

11 of 41

Regularized regression

The regularized regression framework can be generally described as follows:

Depending upon f we may have different types of regularized regression frameworks

Regularization term

12 of 41

Regularized regression

takes the form of some norm of
L1 norm

L2 norm

Elastic net (Zou & Hastie 2005)

13 of 41

Elastic net regression

If there are correlated predictors, LASSO will arbitrarily decide between the two to include or exclude
Elastic net provides a tradeoff between ridge and LASSO.

14 of 41

Elastic net regression

Elastic net regression objective for the i^th gene

Which can be equivalently written as

Minimize

Subject to

Estimate via cross validation

L1 norm

L2 norm

15 of 41

Modified Elastic Net (MEN)

The modification to Elastic net

Set this <1 so that if there is a prior edge between x_p->y_i, the regression coefficient will be penalized less

16 of 41

Two approaches to integrate prior graph structure

Modified Elastic Net (MEN)

Bayesian Best Subset Regression (BBSR)

17 of 41

Probabilistic interpretation for the one predictor case

Consider a linear model for one predictor

Assume error is distributed according a Gaussian with mean 0 and variance σ

How to estimate from N datapoints?

Maximize likelihood of data given model

Error

18 of 41

Maximum Likelihood estimate of

Likelihood of data

Taking log

Deriving wrt β₁ and setting to 0

Would get the same answer if minimizing Residual Sum of Squares (RSS)

19 of 41

Probabilistic interpretation in case of p inputs

Assume output Y is

Again can compute likelihood, maximize it to find
Again the ML estimate would be the same as we derived by minimizing the RSS

20 of 41

Bayesian framework to estimate parameters

Instead of optimizing the likelihood, we put a prior on the parameters and optimize the posterior probability of the parameters

Gaussian data likelihood

Parameter prior

What types of priors can we use?

21 of 41

Priors on parameters in regression

Gaussian prior

Also called ridge regression

Laplace prior

Also called Lasso regression

22 of 41

Bayesian Best Subset Regression (BBSR)

Based on a Bayesian framework of model selection

Search among all subsets of regulators and pick the best one to minimize trade off between data fit and model complexity

Assume that the expression level y is distributed according to a Gaussian distribution

Place a prior distribution on parameters, and incorporate prior knowledge of interactions in the parameters

Response variable

Regulators

A number between 0 and infinity

23 of 41

BBSR continued

The posterior distribution over the parameters is given as:

can be tuned to provide a trade-off between the prior and the OLS solution
When is larger, beta is closer to the OLS solution
When it is smaller, beta is closer to the prior
The prior is set to be a vector of all 0s

24 of 41

BBSR continued

Inferelator uses a p-dimensional vector for p predictors

Predictors with prior are set to g (push more towards the OLS solution)

25 of 41

BBSR model selection

The final step in BBSR is to determine the best model out 2^p possible sets
p cannot be very high: the approach sets p to 10
The best model is the one that minimizes prediction error and has the lowest model complexity

26 of 41

Experimental setup

Three datasets

DREAM4: In silico dataset with 100 nodes
E. coli dataset from DREAM5
B. subtilis dataset

Evaluation based on AUPR

Ranking of edges obtained from a bootstrapping strategy

Questions asked

How does the prior parameter affect the performance?
Does the prior hamper performance on parts of the network without prior support?
How robust is the framework to noisy priors?

27 of 41

Workflow of experiments

28 of 41

How does the prior parameter affect the performance?

29 of 41

Can the data discriminate between different types of prior edges?

Low ranked interactions do not have a strong positive or negative correlation

In other words, is the incorporation of prior data-driven?

30 of 41

Ability to recover new edges is not hampered on adding prior

DREAM4

E. coli

B. subtilis

Prior helps

Prior does not help

31 of 41

What happens when one adds noisy priors?

Low and high in BBSR and MEN means less dense or more dense

High noise regime

32 of 41

Summary

Extending the Inferelator linear regression model to incorporate priors

Regularized regression
Probabilistic priors on weights

Experiments suggest

The prior incorporation is data-driven
Adding prior is beneficial even when it is noisy

33 of 41

Goals for this lecture

Incorporating priors in Dependency networks using linear regression
Incorporating priors in Dependency networks using tree models

34 of 41

iRafNet

GENIE3 was shown to be one of the best performing expression-based algorithms
Can we extend the GENIE3 Random Forests based approach to incorporate priors?
iRafNet uses a weighted sampling scheme to incorporate information from different sources of data

Petralia et al. 2015, Bioinformatics

35 of 41

Weighted sampling algorithm in iRafnet

Each data source d provides a score for a regulator k and target j
Convert these scores to sampling weights, w_k->j in a data source and score-specific way
For each node split, instead of sampling uniformly from N potential regulators, select a dataset d randomly and sample N regulators based on their weights in d

36 of 41

iRafNet overview

Petralia et al 2015, Bionformatics

37 of 41

Constructing sampling weights

The prior knowledge is described as a set of weighted networks
Weights for selecting a regulator is derived in a dataset specific manner
Undirected protein-protein interactions:

Weights derived from a diffusion process over graphs (we will see this later lectures)

Time-series expression data

Weight w_j->k assess how predictive g_j’s expression at time t is of g_i’s expression at a future time point t+1
Derive a P-value to assess the strength of the regression weight
Convert P-value into a weight

Knockout data

w_j->k either are derived in multiple ways:
If g_k’s expression changes significantly when g_k is knocked w_j->k is derived from the P-value
Otherwise it is derived based on the overlap of g_j and g_k’s knockout targets or knockout regulators

38 of 41

iRafNet application to real data

Ground truth

Significant interactions identified from ChIP-chip experiments of yeast

Expression dataset

This was a large study measuring gene expression in multiple yeast strains

Prior datasets (included other expression datasets)

Expression time course during cell cycle
Expression data of genetic knockouts of TFs
Protein-protein interactions from public databases (BioGRID, MINT, DIP)

39 of 41

Does adding prior help for iRafNet?

Evaluate on ChIP-chip network of yeast
Expression dataset

This was a large study measuring gene expression

Prior datasets (included other expression datasets)

Expression time course during cell cycle
Knockout data from Hu et al
Protein-protein interactions from public databases (BioGRID, MINT, DIP)

40 of 41

Concluding remarks

We have seen different ways to incorporate other data types to improve the quality of the inferred network
Bayesian networks with structure prior

Use an energy function to assess concordance
Sensitive to incorrect prior information

Dependency networks with priors

Linear regression approach aims to reduce the penalty on inferred edges
Tree-based approach enables a “biased” selection of regulators

41 of 41

Recent work with dependency networks and prior

Wang Y, Cho D-Y, Lee H, Fear J, Oliver B, Przytycka TM. Reprogramming of regulatory network using expression uncovers sex-specific gene regulation in Drosophila. Nat Commun. 2018;9(1):4061. doi:10.1038/s41467-018-06382-z

Miraldi ER, Pokrovskii M, Watters A, et al. Leveraging chromatin accessibility for transcriptional regulatory network inference in T Helper 17 Cells. Genome Res. 2019;29(3):449-463. doi:10.1101/gr.238253.118