Incorporating graph priors in Bayesian networks
Sep 28th 2021
BMI 826-23 Computational Network Biology�Fall 2021
Sushmita Roy
Plan for this section
RECAP: Expression-based network inference methods
Plan for today
Readings
Why prior-based structure learning?
Types of integrative inference frameworks
Unsupervised network inference
Auxiliary data sources to serve as priors
ChIP-seq peaks
Image credit: Alireza Fotuhi Siahpirani
Classes of methods for incorporating priors
Prior-based approaches for network inference
Plan for today
Bayesian formulation of network inference
Optimize posterior distribution of graph given data
Algorithm
Y1
X1
X5
Y2
X2
Model prior
Posterior distribution
Data likelihood
A few computational concepts
Energy function of a network G
Energy function on a graph
Energy function of a graph
Using the energy to define a prior distribution of a graph
Incorporating multiple sources of prior networks
Prior distribution incorporating multiple prior networks
Dynamic Bayesian networks
Dynamic Bayesian Nets (DBNs)
A DBN for p variables and T time points
p
t=1
X11
X21
Xp1
…
Dependency at the first time point
X3: Variables at time t=3
X1
X2
Xp
…
1
1
1
X1
X2
Xp
…
2
2
2
X1
X2
Xp
…
T
T
T
t=2
t=3
t=T
…
…
…
Stationary assumption in a Bayesian network
Due to this assumption, we only need to specify dependencies between two sets of variables
The stationarity assumption states that the dependency structure and parameters do not change with t
p
t
X1t
X2t
Xpt
…
X1t+1
X2t+1
Xpt+1
…
t+1
X1
X2
Xp
…
1
1
1
X1
X2
Xp
…
2
2
2
X1
X2
Xp
…
T
T
T
t=1
t=2
t=T
…
…
…
Computing the joint probability distribution in a DBN
Joint Probability Distribution can be factored into a product of conditional distributions across time and variables:
Parents of Xit defined by the graph G
The partition function for a prior over DBN
The partition function for a DBN prior
The partition function for a DBN prior
Plan for today
Markov Chain Monte Carlo (MCMC) sampling
MCMC for learning a graph structure
Markov chain
A very simple Markov chain
A very simple Markov chain
0.6
high
medium
low
0.1
0.1
0.7
0.2
0.6
0.2
0.3
0.2
P(Xt+1=high|Xt=low)=0.1
We will use the T(Xt+1|Xt) to denote the transition probabilities
These define the transition probabilities
Markov Chain and Stationary distributions
Stationary distribution of a Markov chain
Markov Chains for Bayesian network structure learning
How do we make sure we will draw from the right distribution?
Markov Chains for Bayesian network structure learning
Acceptance probability
Metropolis Hastings algorithm
Elementary proposal moves for DAGs
The proposal distribution is defined by the moves on the graph. The above example shows�a scenario where we have two valid configurations, and a third invalid configuration.
Husmeier, 2005
Defining a proposal distribution from elementary moves
Notice that the neighborhood of the two DAGs are not of the same size
MCMC example
Husmeier 2005
A heuristic to check for MCMC convergence
MCMC for learning a graph prior and structure
MCMC over graph structure and parameters
MCMC over graph structure and hyperparameter
Plan for today
Performance on real data
Inferred hyperparameters for the yeast cell cycle
The two prior networks are very similar
Red and blue show the trajectory of the hyperparameter values during the MCMC
Posterior probability of the hyper parameters: close to 0.
Using a slightly different prior
Prior hyperparameters can be distinguished
Prior that is consistent with the data
Conclusions from the Yeast cell cycle study
Assessing on a well-studied gold standard network: Raf signaling pathway
11 phospho proteins in all.
Results on RAF signaling
Prior helps!
Method can discriminate between true and random prior
Plan for today
Bayesian Inference of Signaling Network Topology in a Cancer Cell Line (Hill et al 2012)
Applying DBNs to infer signaling network topology
Hill et al., Bioinformatics 2012
Application of DBNs to signaling networks
Integrating prior signaling network into the DBN
Data likelihood
Graph prior
Prior strength
Graph features
Prior Following Mukherjee & Speed 2008
Calculating posterior probabilities of edges
Inferred signaling network using a DBN
Results are not sensitive to prior values
Inferred signaling network
Collapsed network
DBN
Using the DBN to make predictions
Experimental validation of links
Add MAPK inhibitor and measure MAPK and STAT3
Their success is measured by the difference in the levels of the targets as a function of the levels of the inhibitors
MAPK is significantly inhibited (P-value 5X10-4)
STAT3 is also inhibited (P-value 3.3X10-4)
Concluding remarks
References
X1
Y2
I12=?
I12 =
1 if X1 interacts with Y2
0 otherwise
Define:
X1Y2.features: Attributes of X1 and Y2
Given:
Prob. of interaction: P(I12=1|X1Y2.features)
We need:
Prob. of no interaction: P(I12=0|X1Y2.features)
Supervised learning of interactions
I12=1
X1
Y2
Yes
Prob. of
interacting >
Prob. of non-interacting?
X1
Y2
I12=0
No
Supervised learning of interactions
A
B
C
D
E
F
….
Positive examples
….
G
H
I
J
K
L
Negative examples
FEATURE SET
Feature extraction
TRAINED CLASSIFIER
Training
E
A
G
L
?
?
Testing
E
A
G
L
Predicted edges
MouseNet: Supervised inference of a functional network
Guan et al., Plos computational biology 2008