1 of 41

Priors in Dependency network learning

Sep 30th 2021

BMI 826-23 Computational Network Biology�Fall 2021

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

2 of 41

Goals for this lecture

  • Incorporating priors in Dependency networks using linear regression
  • Incorporating priors in Dependency networks using tree models

3 of 41

Readings

  • Greenfield, Alex, Christoph Hafemeister, and Richard Bonneau. 2013. “Robust Data-Driven Incorporation of Prior Knowledge into the Inference of Dynamic Regulatory Networks.” Bioinformatics (Oxford, England) 29 (8): 1060–1067. https://doi.org/10.1093/bioinformatics/btt099.
  • (Optional) Petralia F, Wang P, Yang J, Tu Z. Integrative random forest for gene regulatory network inference. Bioinformatics. 2015;31(12):i197-205. doi:10.1093/bioinformatics/btv268

4 of 41

Recall Dependency networks

  • A type of probabilistic graphical model
  • Approximate Markov networks
    • Are much easier to learn from data
  • As in Bayesian networks have
    • A graph structure
    • Parameters capturing dependencies between a variable and its parents
  • Unlike Bayesian network
    • Can have cyclic dependencies
    • Computing a joint probability is harder
      • It is approximated with a “pseudo” likelihood.

Dependency Networks for Inference, Collaborative Filtering and Data visualization

Heckerman, Chickering, Meek, Rounthwaite, Kadie 2000

5 of 41

Recall Learning dependency networks

?

?

?

Bj

  • Let Bj denote the Markov Blanket of a variable Xj.
  • Bj is the set of variables that make Xj independent of all other variables, X-j

  • Bj can be estimated by finding the set of variables that best predict Xj
  • This requires us to specify the form of P(Xj|Bj)

fj=P(Xj|Bj)

  • One can think about this problem as estimating the Markov blanket of each random variable

Xj

6 of 41

Classes of methods for incorporating priors

  • Parameter prior based approaches
      • Inferelator (Greenfield et al., Bioinformatics 2013)
      • Lirnet (Lee et al., Plos computational biology 2009)
  • Structure prior based approaches
      • Dynamic Bayesian network (Hill et al., Bioinformatics, 2012, Werhli et al., 2007)
      • Physical module networks (Novershtern et al., Bioinformatics 2011)
      • MERLIN-P (Siahpirani et al.,2016)

7 of 41

Overview of the Inferelator algorithm

  • Based on linear regression models
  • Handles time series and steady state data
  • Prior is incorporated at the edge weight using two strategies
    • Modified Elastic Net
    • Bayesian Best Subset Regression

Greenfield et al. 2013, Bonneau et al. 2007

8 of 41

Notation

  •  

9 of 41

Modeling the relationship between regulator and target in Inferelator

  • Time series

  • Steady state

m is the time lag

Network inference: Estimate coefficients

Number of genes

Number of samples

10 of 41

Two approaches to integrate prior graph structure

  • Modified Elastic Net (MEN)

  • Bayesian Best Subset Regression (BBSR)

11 of 41

Regularized regression

  • The regularized regression framework can be generally described as follows:

Depending upon f we may have different types of regularized regression frameworks

Regularization term

12 of 41

Regularized regression

  • takes the form of some norm of
  • L1 norm

  • L2 norm

  • Elastic net (Zou & Hastie 2005)

13 of 41

Elastic net regression

  • If there are correlated predictors, LASSO will arbitrarily decide between the two to include or exclude
  • Elastic net provides a tradeoff between ridge and LASSO.

14 of 41

Elastic net regression

  • Elastic net regression objective for the ith gene

  • Which can be equivalently written as

Minimize

Subject to

Estimate via cross validation

L1 norm

L2 norm

15 of 41

Modified Elastic Net (MEN)

  • The modification to Elastic net

Set this <1 so that if there is a prior edge between xp->yi, the regression coefficient will be penalized less

16 of 41

Two approaches to integrate prior graph structure

  • Modified Elastic Net (MEN)

  • Bayesian Best Subset Regression (BBSR)

17 of 41

Probabilistic interpretation for the one predictor case

  • Consider a linear model for one predictor

  • Assume error is distributed according a Gaussian with mean 0 and variance σ

  • How to estimate from N datapoints?
    • Maximize likelihood of data given model

Error

18 of 41

Maximum Likelihood estimate of

  • Likelihood of data

Taking log

Deriving wrt β1 and setting to 0

Would get the same answer if minimizing Residual Sum of Squares (RSS)

19 of 41

Probabilistic interpretation in case of p inputs

  • Assume output Y is

  • Again can compute likelihood, maximize it to find
  • Again the ML estimate would be the same as we derived by minimizing the RSS

20 of 41

Bayesian framework to estimate parameters

  • Instead of optimizing the likelihood, we put a prior on the parameters and optimize the posterior probability of the parameters

Gaussian data likelihood

Parameter prior

What types of priors can we use?

21 of 41

Priors on parameters in regression

  • Gaussian prior

    • Also called ridge regression
  • Laplace prior

    • Also called Lasso regression

22 of 41

Bayesian Best Subset Regression (BBSR)

  • Based on a Bayesian framework of model selection
    • Search among all subsets of regulators and pick the best one to minimize trade off between data fit and model complexity
  • Assume that the expression level y is distributed according to a Gaussian distribution

  • Place a prior distribution on parameters, and incorporate prior knowledge of interactions in the parameters

Response variable

Regulators

A number between 0 and infinity

23 of 41

BBSR continued

  • The posterior distribution over the parameters is given as:

  • can be tuned to provide a trade-off between the prior and the OLS solution
  • When is larger, beta is closer to the OLS solution
  • When it is smaller, beta is closer to the prior
  • The prior is set to be a vector of all 0s

24 of 41

BBSR continued

  • Inferelator uses a p-dimensional vector for p predictors

Predictors with prior are set to g (push more towards the OLS solution)

25 of 41

BBSR model selection

  • The final step in BBSR is to determine the best model out 2p possible sets
  • p cannot be very high: the approach sets p to 10
  • The best model is the one that minimizes prediction error and has the lowest model complexity

26 of 41

Experimental setup

  • Three datasets
    • DREAM4: In silico dataset with 100 nodes
    • E. coli dataset from DREAM5
    • B. subtilis dataset
  • Evaluation based on AUPR
    • Ranking of edges obtained from a bootstrapping strategy
  • Questions asked
    • How does the prior parameter affect the performance?
    • Does the prior hamper performance on parts of the network without prior support?
    • How robust is the framework to noisy priors?

27 of 41

Workflow of experiments

28 of 41

How does the prior parameter affect the performance?

29 of 41

Can the data discriminate between different types of prior edges?

Low ranked interactions do not have a strong positive or negative correlation

In other words, is the incorporation of prior data-driven?

30 of 41

Ability to recover new edges is not hampered on adding prior

DREAM4

E. coli

B. subtilis

Prior helps

Prior does not help

31 of 41

What happens when one adds noisy priors?

Low and high in BBSR and MEN means less dense or more dense

High noise regime

32 of 41

Summary

  • Extending the Inferelator linear regression model to incorporate priors
    • Regularized regression
    • Probabilistic priors on weights
  • Experiments suggest
    • The prior incorporation is data-driven
    • Adding prior is beneficial even when it is noisy

33 of 41

Goals for this lecture

  • Incorporating priors in Dependency networks using linear regression
  • Incorporating priors in Dependency networks using tree models

34 of 41

iRafNet

  • GENIE3 was shown to be one of the best performing expression-based algorithms
  • Can we extend the GENIE3 Random Forests based approach to incorporate priors?
  • iRafNet uses a weighted sampling scheme to incorporate information from different sources of data

Petralia et al. 2015, Bioinformatics

35 of 41

Weighted sampling algorithm in iRafnet

  • Each data source d provides a score for a regulator k and target j
  • Convert these scores to sampling weights, wk->j in a data source and score-specific way
  • For each node split, instead of sampling uniformly from N potential regulators, select a dataset d randomly and sample N regulators based on their weights in d

36 of 41

iRafNet overview

Petralia et al 2015, Bionformatics

37 of 41

Constructing sampling weights

  • The prior knowledge is described as a set of weighted networks
  • Weights for selecting a regulator is derived in a dataset specific manner
  • Undirected protein-protein interactions:
    • Weights derived from a diffusion process over graphs (we will see this later lectures)
  • Time-series expression data
    • Weight wj->k assess how predictive gj’s expression at time t is of gi’s expression at a future time point t+1
    • Derive a P-value to assess the strength of the regression weight
    • Convert P-value into a weight
  • Knockout data
    • wj->k either are derived in multiple ways:
    • If gk’s expression changes significantly when gk is knocked wj->k is derived from the P-value
    • Otherwise it is derived based on the overlap of gj and gk’s knockout targets or knockout regulators

38 of 41

iRafNet application to real data

  • Ground truth
    • Significant interactions identified from ChIP-chip experiments of yeast
  • Expression dataset
    • This was a large study measuring gene expression in multiple yeast strains
  • Prior datasets (included other expression datasets)
    • Expression time course during cell cycle
    • Expression data of genetic knockouts of TFs
    • Protein-protein interactions from public databases (BioGRID, MINT, DIP)

39 of 41

Does adding prior help for iRafNet?

  • Evaluate on ChIP-chip network of yeast
  • Expression dataset
    • This was a large study measuring gene expression
  • Prior datasets (included other expression datasets)
    • Expression time course during cell cycle
    • Knockout data from Hu et al
    • Protein-protein interactions from public databases (BioGRID, MINT, DIP)

40 of 41

Concluding remarks

  • We have seen different ways to incorporate other data types to improve the quality of the inferred network
  • Bayesian networks with structure prior
    • Use an energy function to assess concordance
    • Sensitive to incorrect prior information
  • Dependency networks with priors
    • Linear regression approach aims to reduce the penalty on inferred edges
    • Tree-based approach enables a “biased” selection of regulators

41 of 41

Recent work with dependency networks and prior

  • Wang Y, Cho D-Y, Lee H, Fear J, Oliver B, Przytycka TM. Reprogramming of regulatory network using expression uncovers sex-specific gene regulation in Drosophila. Nat Commun. 2018;9(1):4061. doi:10.1038/s41467-018-06382-z

  • Miraldi ER, Pokrovskii M, Watters A, et al. Leveraging chromatin accessibility for transcriptional regulatory network inference in T Helper 17 Cells. Genome Res. 2019;29(3):449-463. doi:10.1101/gr.238253.118