1 of 49

Incorporating dynamics and graph priors in Bayesian networks

Sep 17th 2024

BMI/CS 775 Computational Network Biology�Fall 2024

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

2 of 49

Plan for this section

  • Representing molecular networks as probabilistic models (Sep 10th)
  • Learning Bayesian networks and extensions (Sep 10th,12th, 17th)
    • Learning Dynamic Bayesian Networks
    • Adding priors
  • Learning Dependency networks (Sep 19th)
  • Learning causal graphs (Sep 24th, 26th)

2

3 of 49

Plan for today

  • RECAP of Bayesian networks and sparse candidates
  • Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
  • Incorporating priors on graph structure
  • Applications of Dynamic Bayesian networks with priors
  • Non-stationary Dynamic Bayesian Networks

4 of 49

The Sparse candidate algorithm for Structure learning in Bayesian networks

  • A fast Bayesian network learning algorithm
  • Key idea: Identify k “promising” candidate parents for each Xi
    • k<<p, p: number of random variables
    • Candidates define a “skeleton graph” H
  • Restrict graph structure to select parents from H
  • Early choices in H might exclude other good parents
    • Resolve using an iterative algorithm

4

5 of 49

Some comments about choosing candidates

  • How to select k in the sparse candidate algorithm?
  • Should k be the same for all Xi ?
  • Estimate an undirected dependency network
    • Learn a Bayesian network constrained on the dependency network structure
  • Regularized regression approaches can be used to estimate the structure of an undirected graph
    • Schmidt, Niculescu-Mizil, Murphy 2007

5

6 of 49

Plan for today

  • RECAP of Bayesian networks and sparse candidates
  • Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
  • Incorporating priors on graph structure
  • Applications of Dynamic Bayesian networks with priors
  • Non-stationary Dynamic Bayesian Networks

7 of 49

Readings

  • Hill, Steven M., Yiling Lu, Jennifer Molina, Laura M. Heiser, Paul T. Spellman, Terence P. Speed, Joe W. Gray, Gordon B. Mills, and Sach Mukherjee. 2012. “Bayesian Inference of Signaling Network Topology in a Cancer Cell Line.” Bioinformatics (Oxford, England) 28 (21): 2804–2810. https://doi.org/10.1093/bioinformatics/bts514.

8 of 49

Gene expression dynamics in timeseries experiments

Genes

Time

Chasman et al., 2017 Cell Systems; White et al 2017 eLife

zebrafish development

human spinal cord differentiation

Time

9 of 49

Key questions about modeling network dynamics

  • What does modeling “dynamics” mean?
    • The activity of nodes change over time and we want to model how this happens
    • The network (structure or parameters) changes with time
      • Structure can change due to changes at the node or edge level
  • What models can be used for capturing dynamics in networks?
  • How do these models capture dynamics?
    • Node level
    • Edge level

10 of 49

Dynamic Bayesian networks

  • Bayesian networks that we have seen so far do not model temporal dynamics
  • Dynamic Bayesian networks (DBN) are a type of Bayesian networks to model temporal dynamics as measured in timeseries datasets

11 of 49

Dynamic Bayesian Networks (DBNs)

  • A DBN is a Bayesian Network for dynamic processes
  • Suppose we have a time course with T time points
  • Let denote the set of p random variables at time t
  • Let
  • A DBN over these variables defines the joint distribution of P(X)
  • A DBN, like a BN, has a directed acyclic graph G and parameters Θ
  • G typically specifies the dependencies between time points
    • In addition we need to specify dependence structure (if any) at the initial time point (t=1)

12 of 49

A DBN for p variables and T time points

p

t=1

X11

X21

Xp1

Dependency at the first time point

X3: Variables at time t=3

X1

X2

Xp

2

2

2

X1

X2

Xp

3

3

3

X1

X2

Xp

T

T

T

t=2

t=3

t=T

  • We assume that the measurements at time t+1 depends upon t
    • But this does not change with t
  • DBNs can be used to model how the system evolves over time
    • We may have a different network at the first time point

13 of 49

Stationary assumption in a DBN

Due to this assumption, we only need to specify dependencies between two sets of variables

The stationary assumption states that the dependency structure of and parameters do not change with t

p

t

X1t

X2t

Xpt

X1t+1

X2t+1

Xpt+1

t+1

X1

X2

Xp

1

1

1

X1

X2

Xp

2

2

2

X1

X2

Xp

T

T

T

t=1

t=2

t=T

Later we will see an example of an approach where this assumption is relaxed

14 of 49

Computing the joint probability distribution in a DBN

  • The DBN specifies the Joint Probability Distribution for an entire time-series.
  • It can be factored into a product of conditional distributions across time and variables:

Parents of Xit defined by the graph G

15 of 49

Learning problems in DBNs

  • Parameter learning:
    • Given a DBN structure, estimate the parameters for each variable’s CPD

  • Given a timeseries dataset D, learn the DBN structure

16 of 49

Plan for today

  • RECAP of Bayesian networks and sparse candidates
  • Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
  • Incorporating priors on graph structure
  • Applications of Dynamic Bayesian networks with priors
  • Non-stationary Dynamic Bayesian Networks

17 of 49

Why prior-based structure learning?

  • Learning networks with thousands of variables is computationally challenging
    • The space of possible graphs is huge
    • Insufficient number of training examples to learn these networks reliably
    • Multiple equivalent models can be learned
  • One type of data (expression) might not inform us of all the regulatory edges
  • Priors can encode additional information to constrain the graph structure
    • Priors could impose global structure or individual edge level information

18 of 49

Bayes rule

Posterior

Prior

Data likelihood

Marginal likelihood

  • This allows us to incorporate evidence to inform our prior belief about a particular variable
  • Sometimes it is easier to compute P(B|A). Bayes rule can be used to compute P(A|B).

Adapted from Prof. Mark Craven’s slides for BMI/CS 576

19 of 49

Prior-based structure learning

  • Given
    • Measurements (e.g. gene expression) of all nodes across multiple samples
    • Complementary data that supports the presences of an edge
  • Do
    • Find a graph structure that well describes the data subject to constraints in the prior
  • How?
    • Place a prior on the graph where the prior is obtained from complementary data

20 of 49

Bayesian formulation of network inference

Optimize posterior distribution of graph given data

Algorithm

Y1

X1

X5

Y2

X2

Model prior

Posterior distribution

Data likelihood

21 of 49

Auxiliary data sources to serve as priors in regulatory networks

  • ChIP-chip and ChIP-seq

  • Sequence specific motifs in accessible regions of the genome using ATAC-seq and/or DNase-seq

  • Factor knockout followed by whole-transcriptome profiling

  • Literature

ChIP-seq peaks

Image credit: Alireza Fotuhi Siahpirani

22 of 49

How to define priors on graphs?

  • One popular way is to use “energy functions” to define prior probabilities on a graph G
  • A function that measures agreement between a given graph G and prior knowledge
  • Allows one to incorporate both positive and negative prior knowledge

23 of 49

Energy function on a graph

  • A graph G is represented by a binary adjacency matrix
    • Gij = 0 if there is no edge from node i to node j
    • Gij = 1 if there is an edge from i to j
    • Gji = 1 if there is an edge from j to i
  • Encode a “prior” network as follows:
    • Bij= 0.5 if we don’t have any prior knowledge
    • 0≤ Bij<0.5 if we know that there is no edge
    • 1 ≥Bij>0.5 if we know there is an edge

24 of 49

Energy function of a graph

  • Energy E of a network G is defined as

  • This is 0 when there is perfect agreement between prior knowledge B and G
  • Higher the energy of G the greater the mismatch

Werhli, Adriano V., and Dirk Husmeier. 2007; Mukherjee and Speed 2008

25 of 49

Using the energy to define a prior distribution of a graph

  • A prior distribution for a graph G can be defined using E(G)

  • This is also called a Gibbs distribution
  • is the hyperparameter: parameter of the prior distribution
  • Z is the partition function

  • In general, the partition function is hard to compute

26 of 49

Incorporating multiple sources of prior networks

  • Suppose we have two sources of prior networks
  • We can represent them as two prior networks B1 and B2
  • And define the energy of G with respect to both of these

27 of 49

Prior distribution incorporating multiple prior networks

  • The prior takes the form of another Gibbs distribution

  • This can be extended to more prior networks in the same way
  • The partition functions are in general hard to compute
  • However, for (some) DBNs, these partition functions can be computed easily

28 of 49

The partition function for a prior over DBN

  • The energy for a DBN or a BN

  • Which can be re-written as

parents of ith variable

29 of 49

The partition function for a DBN prior

  • Plugging in into the partition function we get:

  • Re-written as a sum over parent sets:

  • Because of exponents, we can rewrite the above as:

30 of 49

The partition function for a DBN prior

  • The partition function of the DBN can be computed using local, easy to compute components
  • Each sum is over possible configurations of the parent set
  • If we restrict the number of parents to m, this is polynomial in p
  • If we allow edges only between time points, we don’t need check for the DAG constraint

31 of 49

Learning problems in DBNs with priors

  • How to set the hyper-parameter of our prior distribution?
    • Markov Chain Monte Carlo (Werhli & Husmeier 2007; Merrell & Gitter 2020)
    • Empirical Bayes (Hill et al 2012)
  • How to do structure learning?
    • Greedy algorithm
    • Markov Chain Monte Carlo

32 of 49

Plan for today

  • RECAP of Bayesian networks and sparse candidates
  • Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
  • Incorporating priors on graph structure
  • Applications of Dynamic Bayesian networks with priors
  • Non-stationary Dynamic Bayesian Networks

33 of 49

Bayesian Inference of Signaling Network Topology in a Cancer Cell Line (Hill et al 2012)

  • Protein signaling networks are important for many cellular diseases
    • The networks can differ between normal and disease cell types
  • But the structure of the network remains incomplete
  • Temporal activity of interesting proteins can be measured and used to infer the network structure
  • Build on prior knowledge of signaling networks to learn a better, predictive network

34 of 49

Applying DBNs to infer signaling network topology

Hill et al., Bioinformatics 2012

35 of 49

Application of DBNs to signaling networks

  • Dataset description
    • Phospho-protein levels of 20 proteins
    • Eight time points
    • Four growth conditions
  • Use known signaling network as a graph prior
  • Estimate CPDs as conditional regularized Gaussians
  • Assume a first-order Markov model
    • Xt depends on on Xt-1

36 of 49

Integrating prior signaling network into the DBN

  • A Bayesian approach to graph learning

  • Graph prior is encoded as an energy function

Data likelihood

Graph prior

Prior strength

Graph features

Original prior description from Mukherjee & Speed 2008, PNAS

37 of 49

Defining the prior

  • Where f(G)=-|E(G)\E*| is defined as the number of edges in the graph G, E(G), that are not in the prior set E*
  • This prior does not promote new edges, but penalizes edges that are not in the prior

38 of 49

Calculating posterior probabilities of edges

  • For each edge e, we need to calculate the posterior probability of an edge given data:

  • To compute the posterior probability, the sum goes over all possible parent sets
    • Assume P(G) factorizes over individual edges
    • Assume a node can have no more than dmax parents
    • Allow edges only forward in time
      • The learning problem decomposes to smaller per-variable problems that can be solved by variable selection

39 of 49

Inferred signaling network using a DBN

Results are not sensitive to prior values

Inferred signaling network

Collapsed network

DBN

40 of 49

Using the DBN to make predictions

  • Although many edges were expected, several edges were unexpected
  • Select novel edges based on posterior probability and test them based on inhibitors
  • For example, if an edge was observed from X to Y, inhibition of X should affect the value of Y if X is a causal regulator of Y
  • Example edge tested
    • MAPKp to STAT3p(S727) with high probability (0.98)
      • Apply MEKi, which is an inhibitor of MAPK, and measure MAPKp and STAT3p post inhibition
    • AKTp to p70S6Kp, AKTp to MEKp and AKTp to cJUNp

41 of 49

Experimental validation of links

Add MAPK inhibitor and measure MAPK and STAT3

Their success is measured by the difference in the levels of the targets as a function of the levels of the inhibitors

MAPK is significantly inhibited (P-value 5X10-4)

STAT3 is also inhibited (P-value 3.3X10-4)

MAPK

STAT3

42 of 49

Plan for today

  • RECAP of Bayesian networks and sparse candidates
  • Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
  • Incorporating priors on graph structure
  • Applications of Dynamic Bayesian networks with priors
  • Non-stationary Dynamic Bayesian Networks

43 of 49

Motivation

  • Consider a developmental regulatory network or the neural network of a song bird
  • Both of these networks gradually adapt over time
  • Such dynamics are defined by a network that is ”evolving” over time
  • Such a time course has multiple time segments, each segment governed by a “different” network

44 of 49

Non-stationary Dynamic Bayesian Networks (nsDBNs)

  • The standard DBN makes the stationarity assumption
    • Distributions do not change depending upon where you are in the time course
  • Non-stationary DBNs are DBNs where this assumption is relaxed
    • Introduced by Robinson and Hartemink 2010
    • Depending upon the time-window/epoch we might have a different dependency structure

45 of 49

Non-stationary Dynamic Bayesian Networks

  • Suppose we have 3 time windows or epochs
  • nsDBNs require us to define the dependency structure in each time window

Adapted from Robinson & Hartermink 2010

X1

X3

X4

X2

t1

t2

Transition times

X1

X3

X3

X4

X1

X3

X4

X2

X1

X2

X4

X4

X1

X3

X4

X2

Edge set change between time windows

G1

G2

G3

46 of 49

Non-stationary Dynamic Bayesian Networks continued

  • For m epochs, we would like to find G1,.., Gm by optimizing their posterior distribution

Prior over m graphs; can be used to incorporate our prior knowledge of how the graphs transition

47 of 49

Take away points

  • Bayesian networks are popular approaches for modeling gene expression data and networks
  • To learn Bayesian networks from data we need to:
    • Learn the structure
    • Learn the parameters
  • Structure learning requires us to
    • Search for candidate graphs
    • Score each graph wrt the data
  • Dynamic Bayesian networks allow us to model temporal dependencies
    • Standard DBNs require a stationarity assumption, but nsDBNs relax this

48 of 49

Take away points contd

  • Addition of prior knowledge can help
    • integrate other sources of data
    • constrain the graph structure in specific ways
  • How to estimate the strength of the prior?
    • Empirical Bayes
    • Markov Chain Monte Carlo (MCMC)
  • Adding prior helped network structure learning

49 of 49

References

  • Introduction to Learning Bayesian Networks from Data. Dirk Husmeier 2005
  • Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge. Adriano V. Werhli and Dirk Husmeier 2007
  • Bayesian inference of signaling network topology in a cancer cell line. S. M. Hill, Y. Lu, J. Molina, L. M. Heiser, P. T. Spellman, T. P. Speed, J. W. Gray, G. B. Mills, and S. Mukherjee, Bioinformatics (Oxford, England), vol. 28, no. 21, pp. 2804-2810, Nov. 2012.