JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 69

Input Output HMMs for modeling network dynamics

Oct 12^th and 14^th, 2021

BMI 826-23 Computational Network Biology�Fall 2021

Anthony Gitter

https://compnetbiocourse.discovery.wisc.edu

The style of IOHMM is adapted from Prof. Mark Craven’s lectures on HMMs

Original IOHMM slides created by Prof. Sushmita Roy

2 of 69

Goals for today

What are Hidden Markov Models (HMMs)?

How do they relate to Dynamic Bayesian Networks?

What are Input-Output HMMs (IOHMMs)?
EM algorithm for learning IOHMMs
Application of IOHMMs to examine regulatory network dynamics

3 of 69

Motivation

Suppose we are given time series expression profiles
We wish to find key regulators that are associated with changes in expression levels over time
We have seen a simple approach to do this

Activity subgraph/skeleton network-based approaches

Can we more explicitly take time into account?

4 of 69

DREM: Dynamic Regulatory Events Miner

Ernst et al., 2007, Mol Sys Biol

5 of 69

Recall Markov chains

A Markov chain is a probabilistic model for sequential observations where there is a dependency between the current and the previous state
It is defined by a graph of possible states and a transition probability matrix defining transitions between each pair of state
The states correspond to the possible assignments a variable can state
One can think of a Markov chain as doing a random walk on a graph with nodes corresponding to each state

6 of 69

A three state Markov chain

0.6

high

medium

low

0.1

0.7

0.2

0.6

0.2

0.3

0.2

P(X_t+1=high|X_t=low)=0.1

These define the transition probabilities

7 of 69

Notation for a Markov chain

States are numbered from 1 to K

observed character at position t

Observed sequence of characters

Transition probabilities

8 of 69

Hidden Markov Models

Hidden Markov models are also probabilistic models used to model sequential data about a dynamical system
At each time point the system is a hidden state that is dependent upon the previous states (history)
The observation sequence is the output of a hidden state
HMMs are defined by observation models and transition models

Murphy 2000

9 of 69

What does an HMM do?

Enables us to model observed sequences of characters generated by a hidden dynamic system
The system can exist in a fixed number of “hidden” states
The system probabilistically transitions between states and at each state it emits a symbol/character

10 of 69

Defining an HMM

States
Emission alphabet
Parameters

State transition probabilities for probabilistic transitions from state at time t to state at time t+1
Emission probabilities for probabilistically emitting symbols from a state

11 of 69

Notation

States are numbered from 1 to K
Observed sequence
Hidden state sequence or path
Transition probabilities

Emission probabilities: Probability of emitting symbol b from state k

12 of 69

An HMM for an occasionally dishonest casino

0.9

0.95

0.05

0.1

1/10
1/10
1/10
1/10
1/10
1/2

Fair

Loaded

What is hidden?

What is observed?

Emission probabilities

Transition probabilities

13 of 69

An HMM for an occasionally dishonest casino

0.9

0.95

0.05

0.1

1/10
1/10
1/10
1/10
1/10
1/2

Fair

Loaded

What is hidden?

What is observed?

Which dice is rolled

Number (1-6) on the die

Emission probabilities

Transition probabilities

14 of 69

Goals for today

What are Hidden Markov Models (HMMs)?

How do they relate to Dynamic Bayesian Networks?

What are Input-Output HMMs (IOHMMs)?
Learning problems of IOHMMs
Application of IOHMMs to examine regulatory network dynamics

15 of 69

Recall a DBN for p variables and T time points

t=0

X₁¹

X₂¹

X_p¹

…

Dependency at the first time point

X^2:Variables at time t=2

X₁

X₂

X_p

…

X₁

X₂

X_p

…

X₁

X₂

X_p

…

t=1

t=2

t=T

…

16 of 69

HMM represented as a DBN

DBN

A DBN could be used to represent the transition probabilities more compactly.

For example, consider the state variable�to be D-dimensional each with K possible values.

For example we are tracking D objects and each object can have K possible settings
The state variable can have K^Dpossible values

An HMM will attempt to model the transition probabilities between all state combinations.

In other words, the DBN will look fully connected.�

Kevin Murphy, 2000

17 of 69

DBN version of the occasional dishonest casino

	F	L
F	0.95	0.05
L	0.1	0.9

	1	2	3	4	5	6
F	1/6	1/6	1/6	1/6	1/6	1/6
L	0.1	0.1	0.1	0.1	0.1	0.5

18 of 69

Three important questions in HMMs

What is the probability of a sequence from an HMM?

Forward algorithm

What is the most likely sequence of states for generating a sequence of observations

Viterbi algorithm

How can we learn an HMM from a set of sequences?

Forward-backward or Baum-Welch (an EM algorithm)

19 of 69

Computing the probability of a sequence from an HMM

Emitting symbol x_t

State transition between

consecutive time points

20 of 69

Computing the probability of a sequence from an HMM

But we don’t know what the sequence of states (path) is
So we need to sum over all paths
The probability over all paths is:

The forward algorithm gives an efficient way to compute this probability
It is based on the concept of dynamic programming

Sum over all paths

21 of 69

Goals for today

What are Hidden Markov Models (HMMs)?

How do they relate to Dynamic Bayesian Networks?

What are Input-Output HMMs (IOHMMs)?
Learning problems of IOHMMs
Application of IOHMMs to examine regulatory network dynamics

22 of 69

Input output Hidden Markov Models (IOHMM)

As in the HMM we have

States, emissions and transitions

In addition we have a sequence of inputs

The transitions and emissions can depend on inputs (u₁,..,u_T)

In a way, IOHMMs map inputs to outputs

This is different from HMMs

HMMs aim to define P(x₁..x_T) while IOHMMs define P(x₁..x_T|u₁..u_T)

Bengio & Frasconi, IEEE Trans on Neural Networks 1996

23 of 69

Input output Hidden Markov Models (IOHMM)

Inputs

Outputs

Hidden states

Transition and emissions are dependent upon a set of external stimuli

24 of 69

Formally defining an IOHMM

The set of K hidden states
Emission characters/symbols/values
Transition probabilities conditioned on the input

Unlike HMMS where we had
Here we have

Similarly for emission probabilities on the input and state

25 of 69

Three important questions in IOHMMs

What is the probability of a sequence from an IOHMM?

Forward algorithm

What is the most likely sequence of states for generating a sequence of observations

Viterbi algorithm

How can we learn an IOHMM from a set of sequences?

Forward-backward algorithm (an EM algorithm)

26 of 69

Computing the probability of a sequence from an IOHMM

Emitting symbol x_t

State transition between

consecutive time points

27 of 69

As in the case of HMMs

We would need to sum over the possible state configurations

We will use the forward algorithm for this problem

Sum over all paths

28 of 69

How likely is a given sequence: Forward algorithm

Define as the probability of observing

and ending in state k at time t given inputs u₁..u_t

This can be written as follows

29 of 69

Steps of the Forward algorithm

Initialization

Recursion: for t=2 to T

Termination

30 of 69

Working through an example

Suppose we measured three reporter molecules whose values depend upon input chemical stimulus and whether one of four possible hidden pathways are triggered.
Chemical stimulus: {0, 1}
Hidden Pathways: {A, B, C, D}
Reporter molecules: {r₁, r₂, r₃}
Given a sequence of reporter molecule measurements, and chemical stimuli, we can infer different quantities associated with this sequence

31 of 69

Mapping to an IOHMM

Inputs

Outputs

Hidden states

We need to specify three CPTs

32 of 69

The CPTs that we will use

	A	B	C	D
0	0.5	0.5	0	0
1	0.5	0.5	0	0

		A	B	C	D
0	A	0.6	0.2	0.2	0
0	B	0.2	0.6	0	0.2
0	C	0.1	0	0.8	0.1
0	D	0	0.25	0.25	0.5
1	A	0.8	0.1	0.1	0
1	B	0.8	0.1	0	0.1
1	C	0.1	0	0.8	0.1
1	D	0	0.1	0.8	0.1

		r₁	r₂	r₃
0	A	0.8	0.1	0.1
0	B	0.2	0.6	0.2
0	C	0.25	0.5	0.25
0	D	0.2	0.2	0.6
1	A	0.2	0.6	0.2
1	B	0.25	0.5	0.25
1	C	0.2	0.2	0.6
1	D	0.5	0.25	0.25

Suppose we observed the following sequences

0 1 1 0

r₁ r₁ r₂ r₃

Input:

Output:

How likely is this observation from our IOHMM?

33 of 69

Transition probabilities encode some independencies

		A	B	C	D
0	A	0.6	0.2	0.2	0
0	B	0.2	0.6	0	0.2
0	C	0.1	0	0.8	0.1
0	D	0	0.25	0.25	0.5
1	A	0.8	0.1	0.1	0
1	B	0.8	0.1	0	0.1
1	C	0.1	0	0.8	0.1
1	D	0	0.1	0.8	0.1

34 of 69

Applying the forward algorithm

Output:

	1	2	3	4
A
B
C
D

Input:

0	1	1	0

r₁	r₁	r₂	r₃

35 of 69

Working through some calculations: t=1

	A	B	C	D
0	0.5	0.5	0	0
1	0.5	0.5	0	0

		A	B	C	D
0	A	0.6	0.2	0.2	0
0	B	0.2	0.6	0	0.2
0	C	0.1	0	0.8	0.1
0	D	0	0.25	0.25	0.5
1	A	0.8	0.1	0.1	0
1	B	0.8	0.1	0	0.1
1	C	0.1	0	0.8	0.1
1	D	0	0.1	0.8	0.1

		r₁	r₂	r₃
0	A	0.8	0.1	0.1
0	B	0.2	0.6	0.2
0	C	0.25	0.5	0.25
0	D	0.2	0.2	0.6
1	A	0.2	0.6	0.2
1	B	0.25	0.5	0.25
1	C	0.2	0.2	0.6
1	D	0.5	0.25	0.25

0 1 1 0

r₁ r₁ r₂ r₃

Input:

Output:

36 of 69

Working through some calculations: t=2

	A	B	C	D
0	0.5	0.5	0	0
1	0.5	0.5	0	0

		A	B	C	D
0	A	0.6	0.2	0.2	0
0	B	0.2	0.6	0	0.2
0	C	0.1	0	0.8	0.1
0	D	0	0.25	0.25	0.5
1	A	0.8	0.1	0.1	0
1	B	0.8	0.1	0	0.1
1	C	0.1	0	0.8	0.1
1	D	0	0.1	0.8	0.1

		r₁	r₂	r₃
0	A	0.8	0.1	0.1
0	B	0.2	0.6	0.2
0	C	0.25	0.5	0.25
0	D	0.2	0.2	0.6
1	A	0.2	0.6	0.2
1	B	0.25	0.5	0.25
1	C	0.2	0.2	0.6
1	D	0.5	0.25	0.25

0 1 1 0

r₁ r₁ r₂ r₃

Input:

Output:

37 of 69

Result of applying the forward algorithm

Input:

Output:

	1	2	3	4
A	0.4	0.08	0.04488	0.0033861
B	0.1	0.0125	0.033125	0.021860825
C	0	0.008	0.00308	0.0028790625
D	0	0.01	0.0007625	0.00438855

0	1	1	0

r₁	r₁	r₂	r₃

38 of 69

Learning an IOHMM from data

Given J paired sequences

Parameter estimation:

Learn the transition and emission probability distributions
This is very similar to what is done in HMMs

Structure learning:

Learn the number of states and the dependencies among the states
Because states are hidden variables and we do not how many there are, this adds another level of complexity in learning
We will first assume that we know the number of states

39 of 69

The expectation maximization algorithm

Expectation Maximization (EM) is a widely used when there are hidden variables
It is an iterative algorithm that maximizes the likelihood of the data
Each iteration is made up of two steps

Expectation step (E): estimate the expected values of hidden variables given the data and previous parameter settings
Maximization step (M): estimate the parameters using the expected counts

40 of 69

Learning without hidden information

Transition probabilities

Emission probabilities

Number of transitions from state k to state l given input p

Number of times c is emitted from k given input p

41 of 69

The expectation step

We need to know the probability of the symbol at t being produced by state i, given the entire observation and input sequence u_1:T, x_1:T

Given these we can compute our expected counts for state transitions, character emissions

We also need to know the probability of observations at t and (t+1) being produced by state i, and j respectively given sequence x

Bengio & Frasconi, IEEE Trans on Neural Networks 1996

42 of 69

Computing

First we compute the probability of the entire observed sequence with the t^thsymbol being generated by state k

Then our quantity of interest is computed as

Obtained from the forward algorithm

43 of 69

Computing

To compute

We need the forward and backward algorithm

Forward algorithm f_k(t)

Backward algorithm b_k(t)

44 of 69

Steps of the backward algorithm

Initialization (t=T)

Recursion (t=T-1 to 1)

45 of 69

Trying out an example with backward algorithm

Again assume we have the same CPTs as those associated with the forward algorithm demo
Assume we observe the following

What are computations for the backward algorithm?

0 1 1

r₁ r₂ r₂

Input:

Output:

46 of 69

Results from applying the backward algorithm

Input:

Output:

	1	2	3
A		0.55	1
B		0.555	1
C			1
D			1

0 1 1

r₁ r₂ r₂

		A	B	C	D
0	A	0.6	0.2	0.2	0
0	B	0.2	0.6	0	0.2
0	C	0.1	0	0.8	0.1
0	D	0	0.25	0.25	0.5
1	A	0.8	0.1	0.1	0
1	B	0.8	0.1	0	0.1
1	C	0.1	0	0.8	0.1
1	D	0	0.1	0.8	0.1

		r₁	r₂	r₃
0	A	0.8	0.1	0.1
0	B	0.2	0.6	0.2
0	C	0.25	0.5	0.25
0	D	0.2	0.2	0.6
1	A	0.2	0.6	0.2
1	B	0.25	0.5	0.25
1	C	0.2	0.2	0.6
1	D	0.5	0.25	0.25

47 of 69

Computing

Using the forward and backward variables, this is computed as

48 of 69

Computing

This is the probability of symbols at t and t+1 emitted from states k and l given the entire observed and sequence x_1:T and input sequence u_1:T

49 of 69

Putting it all together

Assume we are given J training instances

Expectation step

Using current parameter values for each

Apply the forward and backward algorithms
Compute

expected number of transitions between all pairs of states
expected number of emissions for all states

Maximization step

Using current expected counts

Compute the transition and emission probabilities

50 of 69

Baum-Welch one iteration

Let’s assume we have J=2 training instances

Each training example will contribute to the expected counts of transition and emission probabilities
Expectation step:

Compute the forward and backward variables for both training samples, for all time points

51 of 69

Baum-Welch one iteration M step

Suppose we are updating the transition probability of A to B given input u=1

Contribution from sample 1

Sample 2 will not contribute as there is no relevant configuration

52 of 69

Baum-Welch one iteration M step

Suppose we are updating the expected counts for observing r₂ from state B given input u=1

Contribution from sample 1

Contribution from sample 2

53 of 69

Goals for today

What are Hidden Markov Models (HMMs)?

How do they relate to Dynamic Bayesian Networks?

What are Input-Output HMMs (IOHMMs)?
Learning problems of IOHMMs
Application of IOHMMs to examine regulatory network dynamics

54 of 69

Bifurcation events

Bifurcation events occur when sets of genes that have roughly the same expression level up until some time point diverge

Bifurcation events

55 of 69

Dynamic Regulatory Events Miner (DREM)

Given

Gene expression time course
Static TF binding data or signaling networks

Identifies important regulators for interesting temporal changes

DREM is suited for short time courses
DREM is based on an Input-Output HMM

Ernst et al., 2007 Mol Sys Biol

56 of 69

DREM inputs and output

Bar-Joseph et al. 2012 Nat Rev Genet

Time

t₁

t₂

t_N

TF-gene binding interactions

Time series gene expression

log₂ fold change

57 of 69

DREM key idea

Ernst et al., 2007, Mol Sys Biol

58 of 69

IOHMM model in DREM

The output distributions were modeled as Gaussians

Enabled modeling continuous expression values

State transitions depended on static input and the current state

A binary classifier was trained in the M step for each state with two children, to discriminate between genes assigned to the bifurcating states

59 of 69

Defining the transition probability in DREM

DREM uses a binary classifier (logistic regression) to define transition probabilities
Assume we are state h, which has two child states a and b

Retrain in maximization step of Baum-Welch

Input associated with the i^th gene: collection of� binding sites on gene i’s promoter

State-specific parameters

60 of 69

Structure learning in DREM

How do we find the tree structure?

Split genes into train and test
Initialize as linear chain
Iterate

Consider adding each possible child state, retrain
Consider removing each branch, retrain
Track best structure evaluated on test genes

Resplit genes and tune tree structure

61 of 69

Results

Application of DREM to yeast expression data

Amino acid (AA) starvation
One time point ChIP binding in AA starvation

Analysis of condition-specific binding
Application to multiple stress and normal conditions

62 of 69

DREM application in yeast amino acid starvation

DREM identified 11 paths, and associated important AA related TFs for each split

63 of 69

Does condition non-specific data help?

Yes, adding additional non-condition specific data helped explain more splits or found more TFs per split

64 of 69

Validation of INO4 binding

INO4 was a novel prediction by the method
Using a small scale experiment, test binding in 4 gene promoters after AA starvation
Measure genome-wide binding profile of INO4 in AA starvation and SCD and compare relative binding

65 of 69

Validation of INO4 binding

Genes associated with INO4 split profile

INO4 occupancy is much higher in AA �starvation compared to normal (SCD)

More genes are bound genome-wide in AA starvation

Stronger binding in AA starvation of genes in this path

66 of 69

Does integration help?

Randomize ChIP data and ask if enriched TFs with paths were identified

Fewer TFs were identified

Compare IOHMM vs HMM

Lesser enrichment of Gene Ontology processes in HMMs paths compared to IOHMMs

67 of 69

Take away points

Network dynamics can be defined in multiple ways
Skeleton network-based approaches

The universe of networks is fixed, nodes become on or off
Simple to implement, and does not need lot of data
No assumption of how the network changes over time
No model of how the network changes over time
Requires the skeleton network to be complete

68 of 69

Take away points

Network dynamics can be defined in multiple ways
Dynamic Bayesian network

Can learn new edges
Describes how the system transitions from one state to another
Can incorporate prior knowledge
Assumes that the dependency between t-1 and t is the same for all time points (saw how to relax this)
Requires sufficient number of timepoints

69 of 69

Take away points

Network dynamics can be defined in multiple ways
IOHMMs (DREM approach)

Integrates static TF-DNA and dynamic gene expression responses
Works at the level of groups of genes
Does not assume TF expression controls TF activity
Focus on bifurcation points in the time course
Tree structure might be restrictive (although possible extensions are discussed)
Depends upon the completeness of the TF binding data