1 of 45

Hidden Markov models

Lior Pachter

California Institute of Technology

1

Lecture 15

Caltech Bi/BE/CS183

Spring 2023

These slides are distributed under the CC BY 4.0 license

2 of 45

Laws of motion

  • The only mention of time so far in the class was in reference to Gauss’ discovery of the method of least squares while performing a tour-de-force calculation to estimate the orbit of Ceres. This involved estimating 7 parameters from 19 noisy measurements (Lecture 3).
  • In physics the relevance of time is apparent in the the differential equation:��
  • Cells states change with time just like planet positions do. What are the laws of motion for cells?

2

3 of 45

Single-cell snapshots of cellular motion

3

4 of 45

Measuring cell states across time

4

5 of 45

Pseudotime assignment

5

Paul Magwene

6 of 45

Pseudotime assignment

6

Paul Magwene

7 of 45

Measuring cell states across time

7

Determining when genes are active

Hidden Markov model

8 of 45

Hidden Markov model

8

EM algorithm

Dynamic programming

9 of 45

A (discrete time) Markov chain

  • A discrete time Markov chain is a sequence of random variables X1,X2,...Xn with the Markov property, which means that the probability that some Xi is in a certain state depends only on the state of the previous random variable Xi-1 :��
  • The values that the random variables take on are called states, and the set of states is called the state space.��

9

10 of 45

A (discrete time) Markov chain

  • Homogeneous Markov chains are processes where the transition probabilities are constant:��
  • Stationary Markov chains satisfy��
  • A Markov chain can have memory, which means that the current state can depend on a finite number (>1) of previous states.

10

11 of 45

A two state Markov chain

11

12 of 45

A two state Markov chain

12

BAABAA

13 of 45

A lattice view

13

14 of 45

A two-state gene expression model

14

15 of 45

A hidden Markov model

15

16 of 45

A hidden Markov model

16

Solved with the forward algorithm.

17 of 45

A hidden Markov model

17

What is the most likely sequence of states of the Markov chain to have resulted in 1,4,3,6,6,4?

Solved with the Viterbi algorithm.

18 of 45

A hidden Markov model

18

What is the probability that X3 = B if Y0=1,Y1=4,Y2=3,Y3=6,Y4=6,Y5=4 ?

Solved with the forward-backward algorithm.

19 of 45

A hidden Markov model

19

Given multiple sequences of numbers (observations of Y), estimate parameters for the model

Solved with the Baum-Welch algorithm.

20 of 45

A lattice view

20

Observed sequence

Hidden sequence

21 of 45

A two-state gene expression model

21

22 of 45

A hidden Markov model for gene expression during differentiation

22

EM algorithm

Dynamic programming

23 of 45

The Viterbi algorithm I

23

Observed sequence

Tij

  • Construct a matrix T with probabilities of most likely paths ending at state i in position j.

24 of 45

The Viterbi algorithm II

24

Observed sequence

T’ij

  • Construct a matrix T’ with pointers to where the maximizing path arrived from.�

25 of 45

The Viterbi algorithm III

25

Observed sequence

Tij

  • Backtrack using T’

26 of 45

Summary

Markov chain: a memoryless stochastic process

Hidden Markov model: a hidden Markov chain from which observations are generated.

Viterbi algorithm: dynamic programming algorithm for finding the most likely sequence of hidden states to have generated an observed sequence.

Baum-Welch algorithm: an expectation-maximization algorithm for learning parameters of a hidden Markov model.

26

27 of 45

Recall (Lecture 7)

  • In Lecture 15 we will discuss graphical models, which provide a useful language for describing models with latent variables.
  • Shaded circles are observed random variables; unshaded circles are latent random variables.
  • Parameters are shown in boxes.
  • Numbers in the bottom right of each box indicate the number of copies (these are called plates).
  • The edges encode conditional independence.

27

28 of 45

Hidden Markov models (HMMs) as graphical models

28

29 of 45

A chain of length 4

  • Consider the model of length 4:
    • The hidden states are binary.
    • There are six observed states.
    • Suppose that the initial probabilities are each 0.5.
    • There are therefore 12 unknown parameters (= (4-2) + (2*(6-1)) = 12).
  • The probability for each observed sequence of length 4 is a polynomial in the 12 unknowns. There are 1,296 (= 6*6*6*6) such polynomials:

29

30 of 45

Two complementary ways to think about graphical models

  • A set of conditional independence statements.
  • A factored probability distribution.�
  • The directed Hammersley-Clifford theorem is that these two representations are equivalent (some rules and restrictions may apply).

30

31 of 45

Application to gene finding

31

Is there a gene in this sequence?

32 of 45

Recall (Lecture 1)

32

isoforms

33 of 45

A hidden Markov model for gene finding

33

T

A

A

T

A

T

G

T

C

C

A

C

G

G

T

T

G

T

A

C

A

C

G

G

C

A

G

G

T

A

T

T

G

A

G

G

T

A

T

T

G

A

G

A

T

G

T

A

A

C

T

G

A

A

Observed sequence:

Predict genes by running the Viterbi algorithm for an HMM:

34 of 45

Hidden state space

34

The sparsity pattern for the transition matrix for hidden states

35 of 45

Observed sequence

35

36 of 45

Observed sequence

36

37 of 45

A generative view of the HMM

37

38 of 45

The lattice view

38

39 of 45

Intron length distribution in human

39

40 of 45

Waiting times in a Markov chain

  • Pr(leaving state) = p.
  • Pr(staying in state) = 1-p.
  • Pr(output of exactly r in state): (1-p)rp.����
  • This results in a geometric distribution�which matches intron length distributions.

40

A

1-p

p

p

duration

41 of 45

Recall (Lecture 10): geometric distribution I

41

42 of 45

Recall (Lecture 10): geometric distribution II

42

  • The probability distribution of the number of (independent, identically distributed) Bernoulli trials until one success.
  • Alternatively, the probability distribution of the number of (independent, identically distributed) Bernoulli trials until one success. In this case p must be replaced by 1-p in the distribution (and resultant formulas).
  • The lengths of certain genomic features, such as introns, follow an (approximately) geometric distribution.

43 of 45

A full gene-finding HMM

43

44 of 45

Summary

Hidden markov models are useful for annotating features in sequences, or for annotating time series data.

Graphical models: hidden Markov models are special instances of graphical models.

Model choice: while HMMs are restrictive in that they make a strong Markovian assumption for the hidden random variables, associated inference algorithms are efficient.

Waiting times are geometrically distributed, which is suitable for introns. Exons require an extension of HMMs called generalized HMMs.

44

45 of 45

Additional References

45