1 of 49

Incorporating dynamics and graph priors in Bayesian networks

Sep17^th 2024

BMI/CS 775 Computational Network Biology�Fall 2024

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

2 of 49

Plan for this section

Representing molecular networks as probabilistic models (Sep 10^th)
Learning Bayesian networks and extensions (Sep 10^th,12^th,17^th)

Learning Dynamic Bayesian Networks
Adding priors

Learning Dependency networks (Sep 19^th)
Learning causal graphs (Sep 24^th, 26^th)

2

3 of 49

Plan for today

RECAP of Bayesian networks and sparse candidates
Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
Incorporating priors on graph structure
Applications of Dynamic Bayesian networks with priors
Non-stationary Dynamic Bayesian Networks

4 of 49

The Sparse candidate algorithm for Structure learning in Bayesian networks

A fast Bayesian network learning algorithm
Key idea: Identify k “promising” candidate parents for each X_i

k<<p, p: number of random variables
Candidates define a “skeleton graph” H

Restrict graph structure to select parents from H
Early choices in H might exclude other good parents

Resolve using an iterative algorithm

4

5 of 49

Some comments about choosing candidates

How to select k in the sparse candidate algorithm?
Should k be the same for all X_i?
Estimate an undirected dependency network

Learn a Bayesian network constrained on the dependency network structure

Regularized regression approaches can be used to estimate the structure of an undirected graph

Schmidt, Niculescu-Mizil, Murphy 2007

5

6 of 49

Plan for today

RECAP of Bayesian networks and sparse candidates
Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
Incorporating priors on graph structure
Applications of Dynamic Bayesian networks with priors
Non-stationary Dynamic Bayesian Networks

7 of 49

Readings

Hill, Steven M., Yiling Lu, Jennifer Molina, Laura M. Heiser, Paul T. Spellman, Terence P. Speed, Joe W. Gray, Gordon B. Mills, and Sach Mukherjee. 2012. “Bayesian Inference of Signaling Network Topology in a Cancer Cell Line.” Bioinformatics (Oxford, England) 28 (21): 2804–2810. https://doi.org/10.1093/bioinformatics/bts514.

8 of 49

Gene expression dynamics in timeseries experiments

Genes

Time

Chasman et al., 2017 Cell Systems; White et al 2017 eLife

zebrafish development

human spinal cord differentiation

Time

9 of 49

Key questions about modeling network dynamics

What does modeling “dynamics” mean?

The activity of nodes change over time and we want to model how this happens
The network (structure or parameters) changes with time

Structure can change due to changes at the node or edge level

What models can be used for capturing dynamics in networks?
How do these models capture dynamics?

Node level
Edge level

10 of 49

Dynamic Bayesian networks

Bayesian networks that we have seen so far do not model temporal dynamics
Dynamic Bayesian networks (DBN) are a type of Bayesian networks to model temporal dynamics as measured in timeseries datasets

11 of 49

Dynamic Bayesian Networks (DBNs)

A DBN is a Bayesian Network for dynamic processes
Suppose we have a time course with T time points
Let denote the set of p random variables at time t
Let
A DBN over these variables defines the joint distribution of P(X)
A DBN, like a BN, has a directed acyclic graph G and parameters Θ
G typically specifies the dependencies between time points

In addition we need to specify dependence structure (if any) at the initial time point (t=1)

12 of 49

A DBN for p variables and T time points

p

t=1

X₁¹

X₂¹

X_p¹

…

Dependency at the first time point

X^3:Variables at time t=3

X₁

X₂

X_p

…

²

X₁

X₂

X_p

…

³

X₁

X₂

X_p

…

^T

t=2

t=3

t=T

…

We assume that the measurements at time t+1 depends upon t

But this does not change with t

DBNs can be used to model how the system evolves over time

We may have a different network at the first time point

13 of 49

Stationary assumption in a DBN

Due to this assumption, we only need to specify dependencies between two sets of variables

The stationary assumption states that the dependency structure of and parameters do not change with t

p

t

X₁^t

X₂^t

X_p^t

…

X₁^t+1

X₂^t+1

X_p^t+1

…

t+1

X₁

X₂

X_p

…

¹

X₁

X₂

X_p

…

²

X₁

X₂

X_p

…

^T

t=1

t=2

t=T

…

Later we will see an example of an approach where this assumption is relaxed

14 of 49

Computing the joint probability distribution in a DBN

The DBN specifies the Joint Probability Distribution for an entire time-series.
It can be factored into a product of conditional distributions across time and variables:

Parents of X_i^t defined by the graph G

15 of 49

Learning problems in DBNs

Parameter learning:

Given a DBN structure, estimate the parameters for each variable’s CPD

Given a timeseries dataset D, learn the DBN structure

16 of 49

Plan for today

RECAP of Bayesian networks and sparse candidates
Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
Incorporating priors on graph structure
Applications of Dynamic Bayesian networks with priors
Non-stationary Dynamic Bayesian Networks

17 of 49

Why prior-based structure learning?

Learning networks with thousands of variables is computationally challenging

The space of possible graphs is huge
Insufficient number of training examples to learn these networks reliably
Multiple equivalent models can be learned

One type of data (expression) might not inform us of all the regulatory edges
Priors can encode additional information to constrain the graph structure

Priors could impose global structure or individual edge level information

18 of 49

Bayes rule

Posterior

Prior

Data likelihood

Marginal likelihood

This allows us to incorporate evidence to inform our prior belief about a particular variable
Sometimes it is easier to compute P(B|A). Bayes rule can be used to compute P(A|B).

Adapted from Prof. Mark Craven’s slides for BMI/CS 576

19 of 49

Prior-based structure learning

Given

Measurements (e.g. gene expression) of all nodes across multiple samples
Complementary data that supports the presences of an edge

Do

Find a graph structure that well describes the data subject to constraints in the prior

How?

Place a prior on the graph where the prior is obtained from complementary data

20 of 49

Bayesian formulation of network inference

Optimize posterior distribution of graph given data

Algorithm

Y₁

X₁

X₅

Y₂

X₂

Model prior

Posterior distribution

Data likelihood

21 of 49

Auxiliary data sources to serve as priors in regulatory networks

ChIP-chip and ChIP-seq

Sequence specific motifs in accessible regions of the genome using ATAC-seq and/or DNase-seq

Factor knockout followed by whole-transcriptome profiling

Literature

ChIP-seq peaks

Image credit: Alireza Fotuhi Siahpirani

22 of 49

How to define priors on graphs?

One popular way is to use “energy functions” to define prior probabilities on a graph G
A function that measures agreement between a given graph G and prior knowledge
Allows one to incorporate both positive and negative prior knowledge

23 of 49

Energy function on a graph

A graph G is represented by a binary adjacency matrix

G_ij = 0 if there is no edge from node i to node j
G_ij = 1 if there is an edge from i to j
G_ji = 1 if there is an edge from j to i

Encode a “prior” network as follows:

B_ij= 0.5 if we don’t have any prior knowledge
0≤ B_ij<0.5 if we know that there is no edge
1 ≥B_ij>0.5 if we know there is an edge

24 of 49

Energy function of a graph

Energy E of a network G is defined as

This is 0 when there is perfect agreement between prior knowledge B and G
Higher the energy of G the greater the mismatch

Werhli, Adriano V., and Dirk Husmeier. 2007; Mukherjee and Speed 2008

25 of 49

Using the energy to define a prior distribution of a graph

A prior distribution for a graph G can be defined using E(G)

This is also called a Gibbs distribution
is the hyperparameter: parameter of the prior distribution
Z is the partition function

In general, the partition function is hard to compute

26 of 49

Incorporating multiple sources of prior networks

Suppose we have two sources of prior networks
We can represent them as two prior networks B¹ and B²
And define the energy of G with respect to both of these

27 of 49

Prior distribution incorporating multiple prior networks

The prior takes the form of another Gibbs distribution

This can be extended to more prior networks in the same way
The partition functions are in general hard to compute
However, for (some) DBNs, these partition functions can be computed easily

28 of 49

The partition function for a prior over DBN

The energy for a DBN or a BN

Which can be re-written as

parents of i^th variable

29 of 49

The partition function for a DBN prior

Plugging in into the partition function we get:

Re-written as a sum over parent sets:

Because of exponents, we can rewrite the above as:

30 of 49

The partition function for a DBN prior

The partition function of the DBN can be computed using local, easy to compute components
Each sum is over possible configurations of the parent set
If we restrict the number of parents to m, this is polynomial in p
If we allow edges only between time points, we don’t need check for the DAG constraint

31 of 49

Learning problems in DBNs with priors

How to set the hyper-parameter of our prior distribution?

Markov Chain Monte Carlo (Werhli & Husmeier 2007; Merrell & Gitter 2020)
Empirical Bayes (Hill et al 2012)

How to do structure learning?

Greedy algorithm
Markov Chain Monte Carlo

32 of 49

Plan for today

RECAP of Bayesian networks and sparse candidates
Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
Incorporating priors on graph structure
Applications of Dynamic Bayesian networks with priors
Non-stationary Dynamic Bayesian Networks

33 of 49

Bayesian Inference of Signaling Network Topology in a Cancer Cell Line (Hill et al 2012)

Protein signaling networks are important for many cellular diseases

The networks can differ between normal and disease cell types

But the structure of the network remains incomplete
Temporal activity of interesting proteins can be measured and used to infer the network structure
Build on prior knowledge of signaling networks to learn a better, predictive network

34 of 49

Applying DBNs to infer signaling network topology

Hill et al., Bioinformatics 2012

35 of 49

Application of DBNs to signaling networks

Dataset description

Phospho-protein levels of 20 proteins
Eight time points
Four growth conditions

Use known signaling network as a graph prior
Estimate CPDs as conditional regularized Gaussians
Assume a first-order Markov model

X^t depends on on X^t-1

36 of 49

Integrating prior signaling network into the DBN

A Bayesian approach to graph learning

Graph prior is encoded as an energy function

Data likelihood

Graph prior

Prior strength

Graph features

Original prior description from Mukherjee & Speed 2008, PNAS

37 of 49

Defining the prior

Where f(G)=-|E(G)\E^*| is defined as the number of edges in the graph G, E(G), that are not in the prior set E^*
This prior does not promote new edges, but penalizes edges that are not in the prior

38 of 49

Calculating posterior probabilities of edges

For each edge e, we need to calculate the posterior probability of an edge given data:

To compute the posterior probability, the sum goes over all possible parent sets

Assume P(G) factorizes over individual edges
Assume a node can have no more than d_max parents
Allow edges only forward in time

The learning problem decomposes to smaller per-variable problems that can be solved by variable selection

39 of 49

Inferred signaling network using a DBN

Results are not sensitive to prior values

Inferred signaling network

Collapsed network

DBN

40 of 49

Using the DBN to make predictions

Although many edges were expected, several edges were unexpected
Select novel edges based on posterior probability and test them based on inhibitors
For example, if an edge was observed from X to Y, inhibition of X should affect the value of Y if X is a causal regulator of Y
Example edge tested

MAPKp to STAT3p(S727) with high probability (0.98)

Apply MEKi, which is an inhibitor of MAPK, and measure MAPKp and STAT3p post inhibition

AKTp to p70S6Kp, AKTp to MEKp and AKTp to cJUNp

41 of 49

Experimental validation of links

Add MAPK inhibitor and measure MAPK and STAT3

Their success is measured by the difference in the levels of the targets as a function of the levels of the inhibitors

MAPK is significantly inhibited (P-value 5X10^-4)

STAT3 is also inhibited (P-value 3.3X10^-4)

MAPK

STAT3

Fig. 4. Validation of predictions by targeted inhibition in breast cancer cell line MDA-MB-468. (a) MAPK-STAT3 crosstalk. Network inference (Fig. 3a) predicted an unexpected link between phospho-MAPK (MAPKp) and STAT3p(S727) in the breast cancer cell line MDA-MB-468. The hypothesis of MAPK-STAT3 crosstalk was tested by MEK inhibition: this successfully reduced MAPK phosphorylation and resulted in a corresponding decrease in STAT3p(S727). (b) AKTp → p70S6Kp, AKT-MAPK crosstalk and AKT-JNK/JUN crosstalk. AKTp is linked to p70S6kp, MEKp and cJUNp. In line with these model predictions, use of an AKT inhibitor reduced both p70S6K and MEK phosphorylation and increased JNK phosphorylation. (RPPA data; MEK inhibitor GSK1120212 and AKT inhibitor GSK690693B at 0 uM, 0.625 uM, 2.5 uMand 10 uM; measurements taken 0, 5, 15, 30, 60, 90, 120 and 180min after EGF stimulation; average values over 3 replicates shown, error bars indicate SEM)

42 of 49

Plan for today

RECAP of Bayesian networks and sparse candidates
Incorporating dynamics in Bayesian networks: Dynamic Bayesian networks
Incorporating priors on graph structure
Applications of Dynamic Bayesian networks with priors
Non-stationary Dynamic Bayesian Networks

43 of 49

Motivation

Consider a developmental regulatory network or the neural network of a song bird
Both of these networks gradually adapt over time
Such dynamics are defined by a network that is ”evolving” over time
Such a time course has multiple time segments, each segment governed by a “different” network

44 of 49

Non-stationary Dynamic Bayesian Networks (nsDBNs)

The standard DBN makes the stationarity assumption

Distributions do not change depending upon where you are in the time course

Non-stationary DBNs are DBNs where this assumption is relaxed

Introduced by Robinson and Hartemink 2010
Depending upon the time-window/epoch we might have a different dependency structure

45 of 49

Non-stationary Dynamic Bayesian Networks

Suppose we have 3 time windows or epochs
nsDBNs require us to define the dependency structure in each time window

Adapted from Robinson & Hartermink 2010

X₁

X₃

X₄

X₂

t₁

t₂

Transition times

X₁

X₃

X₄

X₁

X₃

X₄

X₂

X₁

X₂

X₄

X₁

X₃

X₄

X₂

Edge set change between time windows

G₁

G₂

G₃

46 of 49

Non-stationary Dynamic Bayesian Networks continued

For m epochs, we would like to find G₁,.., G_m by optimizing their posterior distribution

Prior over m graphs; can be used to incorporate our prior knowledge of how the graphs transition

47 of 49

Take away points

Bayesian networks are popular approaches for modeling gene expression data and networks
To learn Bayesian networks from data we need to:

Learn the structure
Learn the parameters

Structure learning requires us to

Search for candidate graphs
Score each graph wrt the data

Dynamic Bayesian networks allow us to model temporal dependencies

Standard DBNs require a stationarity assumption, but nsDBNs relax this

48 of 49

Take away points contd

Addition of prior knowledge can help

integrate other sources of data
constrain the graph structure in specific ways

How to estimate the strength of the prior?

Empirical Bayes
Markov Chain Monte Carlo (MCMC)

Adding prior helped network structure learning

49 of 49

References

Introduction to Learning Bayesian Networks from Data. Dirk Husmeier 2005
Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge. Adriano V. Werhli and Dirk Husmeier 2007
Bayesian inference of signaling network topology in a cancer cell line. S. M. Hill, Y. Lu, J. Molina, L. M. Heiser, P. T. Spellman, T. P. Speed, J. W. Gray, G. B. Mills, and S. Mukherjee, Bioinformatics (Oxford, England), vol. 28, no. 21, pp. 2804-2810, Nov. 2012.