1 of 31

Cell Dynamics from Snapshot Diagrams

Lior Pachter

California Institute of Technology

1

CCAAGS ‘22

University of Washington

June 30, 2022

2 of 31

Happy birthday Bernd!!

  • Page limit (30 slides).�
  • Time limit (45 minutes).�
  • No “hanging chads”.�
  • Slides have been read out loud.�
  • Researched which coffee shop opens earliest around UW.

2

3 of 31

Amgen’s KINERET approved for COVID-19

3

4 of 31

KINERET (Anakinra)

4

5 of 31

Chemical reaction network analysis of KINERET

5

Gilles Gnacadja

KINERET :

6 of 31

“I don’t believe in real numbers”

6

7 of 31

This is an example of a toric dynamical system

7

Anne Shiu

The equations admit a strictly positive solution (Craciun et al., 2009).��The paper provides a “dictionary” between chemical reaction theory, toric dynamical systems, toric (statistical) models, and algebraic geometry.

��

Alicia Dickenstein

Gheorghe Craciun

Bernd Sturmfels

8 of 31

Chemical reaction networks in biology

  • Monod’s equation (1952): Equivalent to Michaelis-Menten kinetics (not a toric dynamical system).
  • Gorini and Maas (1957): Kinetics of enzymes in E coli.
  • Franklin, Watson and Crick (1953): Structure of DNA.
  • Crick (1958): The central dogma.
  • Jacob and Monod (1961): DNA transcription results in control of enzyme levels.

8

Jacques Monod

9 of 31

The central dogma and “gene expression”

9

Francis Crick

10 of 31

Transcription as a chemical reaction network

  • The simplest approach to modeling transcription is to model the process with mass action differential equations. Say k is a transcription rate, γ is a degradation rate. Then:���
  • Note that this model is continuous, and does not account for the fact that molecule counts are non-negative integers*.
  • There is no stochasticity in the model.��* Until “recently” this has been a moot point, because it was difficult to perform single-molecule experiments. Fluorescent in situ hybridization (FISH) was developed in the 1970s, single-cell genomics experiments have been possible for about a decade.

10

11 of 31

Variation in gene expression measurements

11

12 of 31

A question

  • Is the observed variability due to:�
    • technical noise?
    • biological stochasticity?

12

13 of 31

They say social issues are irrelevant to mathematics…

13

Quantitative Biology��A stochastic model for bursty transcription coupling the telegraph model to a naive RNA transcription model yields a steady state negative binomial distribution for molecule counts.

Computational Biology�

An overdispersed sampling model for RNA molecule counts from single-cell RNA-seq due to technical variation yields a negative binomial distribution for molecule counts.

14 of 31

What exactly is the data and how is it collected?

14

mini laboratory

15 of 31

Example: the inDrops approach

15

16 of 31

Beads, Cells and Droplets

16

Split

Doublet

No capture

Goal

Good

Bad

Irrelevant

Collision

17 of 31

Single-cell RNA-seq

  • Single-cell RNA-seq refers to a group of (constantly improving) technologies and analysis tools that �
    • Start with an INPUT of cells,�
    • OUTPUT a (proxy for a) gene expression matrix.

17

cells

genes

18 of 31

The variance is quadratic in the mean

  • The negative binomial distribution yields a variance that is quadratic in the mean, namely , where μ is the mean and Φ is a parameter called the dispersion or shape parameter.

18

19 of 31

Negative binomial distribution II

19

  • The Poisson distribution arises as the limit of the negative binomial distribution when the stopping criteria (r) goes to infinity:���
  • The negative binomial distribution arises as a mixture of Poisson distributions via a hierarchical model where the Poisson parameter is a Gamma-distributed random variable. For this reason it is also called the Gamma-Poisson distribution.

Computational biologists think that negative binomially distributed counts arise from capture variability among cells.

20 of 31

Variance stabilizing transformations for negative binomial data

  • (Anscombe, 1948) considered transformations for the case when the mean m of the negative binomial distribution is large but k, the exponent, is fixed. In this case he derived the variance stabilizing transformation����where d = -2c and ���
  • Here x is the measurement, and k is the exponent, which is another word for the inverse of the shape parameter of the negative binomial distribution.

20

21 of 31

Variance stabilizing transformations for negative binomial data

  • Anscombe also showed that a good approximation to the transformation ����is given by ���
  • The term k/2 is called a pseudocount.
  • This is why it makes sense to log transform count data.
  • Computational biologists start their single-cell RNA-seq analysis by log-transforming the data to “normalize” it in order to remove the “noise”.

21

22 of 31

A view from the “other” side

22

Quantitative Biology��A stochastic model for bursty transcription coupling the telegraph model to a naïve RNA transcription model yields a steady state negative binomial distribution for molecule counts.

Computational Biology�

An overdispersed sampling model for RNA molecule counts from single-cell RNA-seq due to technical variation yields a negative binomial distribution for molecule counts.

23 of 31

Stochastic chemical reaction networks: �the chemical master equation

23

The number of molecules is a state in a continuous-time Markov chain. Transcription is modeled as a Poisson process. Simulations can be performed with the Gillespie algorithm.

Multiple simulations provide a distribution of trajectory states at a fixed time. ��Theorem: The stationary distribution is the Poisson distribution.

24 of 31

A stochastic model of bursty transcription

  • A multivariate Markov chain that combines a switching gene with a transcription + degradation process.
  • Whenever the gene is in the “on” state, it can transcribe at rate ki. The resultant RNA molecules are degraded at rate γ.
  • In this system constitutive RNA production is coupled to a telegraph system.

24

on

off

Ø

X

RNA

Telegraph

25 of 31

Limiting case: bursty transcription

  • Distributions approach the negative binomial limit, which is heavy-tailed relative compared to the Poisson distribution.
  • At each burst, the number of RNA molecules grows quickly, then gradually decays.
  • Thus, a quantitative biologists sees a negative binomial distribution on counts as a stationary distribution resulting from solving a CME.

25

26 of 31

Throwing the biology out with the noise

  • Distributions approach the negative binomial limit, which is heavy-tailed relative compared to the Poisson distribution.
  • At each burst, the number of RNA molecules grows quickly, then gradually decays.
  • Thus, a quantitative biologist sees a negative binomial distribution on counts as a stationary distribution resulting from solving a CME.
  • Computational biologists start their single-cell RNA-seq analysis by log-transforming the data to “normalize” it in order to remove the “noise”.

26

27 of 31

Quantitative Biology meets Computational Biology: �RNA velocity

  • The term “RNA velocity” was introduced in (La Manno et al., 2018).

27

28 of 31

Quantitative Biology meets Computational Biology: �RNA velocity

  • An RNA velocity image is obtained by pooling cells and averaging across cells to identify velocity directions.

28

29 of 31

Implementations of RNA velocity produce inconsistent results

velocyto

scVelo

Gennady Gorin

Tara Chari

Meichen Fang

30 of 31

RNA velocity produces results inconsistent with biology

Mouse pancreatic endocrinogenesis

scVelo implementation of velocyto

Bergen et al., 2020

Mouse neurogenesis

scVelo

Li et al., 2021

Mouse inner ear development

scVelo implementation of velocyto

Gupta et al., 2021

Mouse embryonic fibroblast reprogramming

scVelo

Lange et al., 2022

Human hematopoiesis

scVelo

Qiu et al., 2022

Mouse pancreatic endocrinogenesis

velocyto

Lange et al., 2022

31 of 31

A challenge for you (Bernd’s community)

  • In addition to standard single-cell RNA-seq processing…
    • Alignment.
    • Normalization.
    • Neighborhood-graph construction.
  • …modeling of spliced and unspliced counts to estimate “velocity” vectors for cells.
  • Dimension reduction to visualize results��Prove theorems that can guide this endeavour!

31