1 of 45

Hidden Markov models

Lior Pachter

California Institute of Technology

Lecture 15

Caltech Bi/BE/CS183

Spring 2023

These slides are distributed under the CC BY 4.0 license

2 of 45

Laws of motion

The only mention of time so far in the class was in reference to Gauss’ discovery of the method of least squares while performing a tour-de-force calculation to estimate the orbit of Ceres. This involved estimating 7 parameters from 19 noisy measurements (Lecture 3).
In physics the relevance of time is apparent in the the differential equation:��
Cells states change with time just like planet positions do. What are the laws of motion for cells?

3 of 45

Single-cell snapshots of cellular motion

Shin et al., 2015

4 of 45

Measuring cell states across time

Shin et al., 2015

5 of 45

Pseudotime assignment

Magewene et al., 2003

Paul Magwene

6 of 45

Pseudotime assignment

Magewene et al., 2003

Paul Magwene

7 of 45

Measuring cell states across time

Shin et al., 2015

Determining when genes are active

Hidden Markov model

8 of 45

Hidden Markov model

Shin et al., 2015

EM algorithm

Dynamic programming

Lecture 8

9 of 45

A (discrete time) Markov chain

A discrete time Markov chain is a sequence of random variables X₁,X₂,...X_n with the Markov property, which means that the probability that some X_i is in a certain state depends only on the state of the previous random variable X_i-1 :��
The values that the random variables take on are called states, and the set of states is called the state space.��

10 of 45

A (discrete time) Markov chain

Homogeneous Markov chains are processes where the transition probabilities are constant:��
Stationary Markov chains satisfy��
A Markov chain can have memory, which means that the current state can depend on a finite number (>1) of previous states.

11 of 45

A two state Markov chain

12 of 45

A two state Markov chain

BAABAA

13 of 45

A lattice view

14 of 45

A two-state gene expression model

15 of 45

A hidden Markov model

16 of 45

A hidden Markov model

Solved with the forward algorithm.

17 of 45

A hidden Markov model

What is the most likely sequence of states of the Markov chain to have resulted in 1,4,3,6,6,4?

Solved with the Viterbi algorithm.

18 of 45

A hidden Markov model

What is the probability that X₃ = B if Y₀=1,Y₁=4,Y₂=3,Y₃=6,Y₄=6,Y₅=4 ?

Solved with the forward-backward algorithm.

19 of 45

A hidden Markov model

Given multiple sequences of numbers (observations of Y), estimate parameters for the model

Solved with the Baum-Welch algorithm.

20 of 45

A lattice view

Observed sequence

Hidden sequence

21 of 45

A two-state gene expression model

22 of 45

A hidden Markov model for gene expression during differentiation

Shin et al., 2015

EM algorithm

Dynamic programming

Lecture 8

23 of 45

The Viterbi algorithm I

Observed sequence

T_ij

Construct a matrix T with probabilities of most likely paths ending at state i in position j.

24 of 45

The Viterbi algorithm II

Observed sequence

T’_ij

Construct a matrix T’ with pointers to where the maximizing path arrived from.�

25 of 45

The Viterbi algorithm III

Observed sequence

T_ij

Backtrack using T’

26 of 45

Summary

Markov chain: a memoryless stochastic process

Hidden Markov model: a hidden Markov chain from which observations are generated.

Viterbi algorithm: dynamic programming algorithm for finding the most likely sequence of hidden states to have generated an observed sequence.

Baum-Welch algorithm: an expectation-maximization algorithm for learning parameters of a hidden Markov model.

27 of 45

Recall (Lecture 7)

In Lecture 15 we will discuss graphical models, which provide a useful language for describing models with latent variables.
Shaded circles are observed random variables; unshaded circles are latent random variables.
Parameters are shown in boxes.
Numbers in the bottom right of each box indicate the number of copies (these are called plates).
The edges encode conditional independence.

Wood and Black, 2008

28 of 45

Hidden Markov models (HMMs) as graphical models

29 of 45

A chain of length 4

Consider the model of length 4:

The hidden states are binary.
There are six observed states.
Suppose that the initial probabilities are each 0.5.
There are therefore 12 unknown parameters (= (4-2) + (2*(6-1)) = 12).

The probability for each observed sequence of length 4 is a polynomial in the 12 unknowns. There are 1,296 (= 6*6*6*6) such polynomials:

30 of 45

Two complementary ways to think about graphical models

A set of conditional independence statements.
A factored probability distribution.�
The directed Hammersley-Clifford theorem is that these two representations are equivalent (some rules and restrictions may apply).

Pachter and Sturmfels, 2005

31 of 45

Application to gene finding

Is there a gene in this sequence?

32 of 45

Recall (Lecture 1)

From M.B. Gerstein et al ., What is a gene, post-ENCODE? History and updated definition, 2007.

isoforms

33 of 45

A hidden Markov model for gene finding

Observed sequence:

Predict genes by running the Viterbi algorithm for an HMM:

34 of 45

Hidden state space

The sparsity pattern for the transition matrix for hidden states

35 of 45

Observed sequence

36 of 45

Observed sequence

37 of 45

A generative view of the HMM

38 of 45

The lattice view

39 of 45

Intron length distribution in human

Sakharkar et al., 2005

40 of 45

Waiting times in a Markov chain

Pr(leaving state) = p.
Pr(staying in state) = 1-p.
Pr(output of exactly r in state): (1-p)^rp.��
This results in a geometric distribution�which matches intron length distributions.

1-p

duration

41 of 45

Recall (L ecture 10): geometric distribution I

42 of 45

Recall (Lecture 10): geometric distribution II

The probability distribution of the number of (independent, identically distributed) Bernoulli trials until one success.
Alternatively, the probability distribution of the number of (independent, identically distributed) Bernoulli trials until one success. In this case p must be replaced by 1-p in the distribution (and resultant formulas).
The lengths of certain genomic features, such as introns, follow an (approximately) geometric distribution.

43 of 45

A full gene-finding HMM

44 of 45

Summary

Hidden markov models are useful for annotating features in sequences, or for annotating time series data.

Graphical models: hidden Markov models are special instances of graphical models.

Model choice: while HMMs are restrictive in that they make a strong Markovian assumption for the hidden random variables, associated inference algorithms are efficient.

Waiting times are geometrically distributed, which is suitable for introns. Exons require an extension of HMMs called generalized HMMs.

1 of 45

2 of 45

3 of 45

4 of 45

5 of 45

6 of 45

7 of 45

8 of 45

9 of 45

10 of 45

11 of 45

12 of 45

13 of 45

14 of 45

15 of 45

16 of 45

17 of 45

18 of 45

19 of 45

20 of 45

21 of 45

22 of 45

23 of 45

24 of 45

25 of 45

26 of 45

27 of 45

28 of 45

29 of 45

30 of 45

31 of 45

32 of 45

33 of 45

34 of 45

35 of 45

36 of 45

37 of 45

38 of 45

39 of 45

40 of 45

41 of 45

42 of 45

43 of 45

44 of 45

45 of 45