1 of 88

The Physics of Transcriptomes

MCB137L/237L

Spring 2025

2 of 88

What Does the mRNA Distribution Tell Us About How Transcription Happens?

Zenklusen et al. (2008)

3 of 88

Testing the Null Hypothesis�Deviations from Poisson Reveal Molecular Mechanism

Zenklusen et al. (2008)

4 of 88

And, Yet, Our Ignorance is Vast

5 of 88

Regulatory Ignorance Throughout Our�Model Organisms

6 of 88

The Cell as a Bag of RNA

7 of 88

Technologies Driving the DNA Sequencing Revolution

8 of 88

From Gene-Wide to�Genome-Wide Studies

9 of 88

A Deluge of Sequencing Data

10 of 88

A Feeling for the Scale of Our Sequencing Data

11 of 88

Estimating Book Lengths

12 of 88

Our Collective Memory:�The Library of Congress

13 of 88

14 of 88

15 of 88

Number of Letters and Their Meaning

16 of 88

The Sequence Read Archive Versus Shakespeare

17 of 88

18 of 88

19 of 88

Fidelity in biological polymerization: Key question, are we surprised?

20 of 88

The Insufficiency of Equilibrium Molecular Recognition

21 of 88

A Toy Model of Translation

22 of 88

The Kinetic Proofreading Idea: Energy to Fuel Error Correction

23 of 88

It’s Not Just About Information Amount, It’s Also About Information Density

24 of 88

Predictive Understanding of Cellular Decision Making Through the Theory-Experiment Dialogue

25 of 88

Precision Measurements to Fuel the Theory-Experiment Dialogue:�Measuring Protein Expression

26 of 88

Precision Measurements to Fuel the Theory-Experiment Dialogue:�Measuring mRNA Expression

27 of 88

Perrin’s Take on Precision Measurements and Reproducibility

28 of 88

The Meaning of Precision Measurements

  • Example of LIGO

29 of 88

Demanding Quantitative Agreement Between Measurements:�The Example of Mass Spectrometry

30 of 88

Demanding Quantitative Agreement Between Measurements:�Flow Cytometry Vs. Microscopy

31 of 88

Demanding Quantitative Agreement:�smFISH vs. Enzymatic Assays vs. Microscopy

32 of 88

Querying the Transcriptome at the Single Cell Level

33 of 88

Querying the Transcriptome at the Single Cell Level

34 of 88

The Single-Cell Sequencing Revolution

  • 44,494 cells and 10,000 genes measures
  • How do you reduce this 10,000 dimensional data to 2 dimensions?
  • How do you identify cell types?

The Tabula muris project

35 of 88

Assume That We Have a Constitutive Promoter

36 of 88

The mRNA Distribution in Space and Time

37 of 88

The Poisson Distribution Is Fully Determined by One Parameter

38 of 88

A Physical Model of the Single-Cell Sequencing Process

39 of 88

A Dishonest Coin Flip Decides Whether Each mRNA Will Be Sequenced

40 of 88

The Statistics of Coin Flips

41 of 88

The Statistics of Coin Flips

42 of 88

The Order of Coin Flips Doesn’t Matter

43 of 88

The Statistics of Coin Flips

44 of 88

The Binomial Distribution�One of the Great Probability Distributions

45 of 88

Add Savage Rosenfeld and other explanations of the Binomial distribution

46 of 88

What Happens With the mRNA Molecules That Were Not Captured?

47 of 88

A Measure of Our Precision: The Debate over Zero Inflation

48 of 88

A Challenge to Quantitative Single-Cell RNA Sequencing:�Zero Inflation and Dropout Probability

49 of 88

SANITY: Assigning Error Bars to scRNA-Seq Data

50 of 88

Querying the Transcriptome

51 of 88

Querying the Transcriptome at the Single Cell Level

52 of 88

The Single-Cell Sequencing Revolution

  • 44,494 cells and 10,000 genes measures
  • How do you reduce this 10,000 dimensional data to 2 dimensions?
  • How do you identify cell types?

The Tabula muris project

53 of 88

Cellular Decisions Are Often Driven by a Handful of Genes

54 of 88

Our Toy Model: 2D Synthetic Transcriptome

55 of 88

Identifying Cell Types in a 2D Synthetic Transcriptome

56 of 88

Finding the Right Coordinate System to Describe our Data

57 of 88

Finding the Right Coordinate System to Describe our Data

58 of 88

Key Idea: Finding the “Right” Coordinates

59 of 88

Key Idea: Finding the “Right” Coordinates

60 of 88

A Toy Model From Mechanics of the Key Idea: Finding the “Right” Coordinates For Two Coupled Oscillators

(Berman et al.)

61 of 88

Solving the Coupled Oscillators in a Bad Coordinate System

(Berman et al.)

62 of 88

Plotting The Two Coordinates Together Reveals Structure

63 of 88

Finding the “Right” Coordinates

(Berman et al.)

64 of 88

The “Right” Coordinates Reveal the Natural Variables of the System

65 of 88

Several Ways of Looking at the Problem: One from Mechanics, One as Data

(Berman et al.)

66 of 88

The Covariance Matrix of Our Rotated Data

67 of 88

Eigenvectors and Eigenvalues

68 of 88

The “Right” Coordinates Reveal the Natural Variables of the System

69 of 88

The Eigenvectors of the Covariance Matrix Define the Normal Modes (or Principal Components)

70 of 88

Your Turn: A Synthetic Transcriptome Made of Two Constitutive Promoters

71 of 88

Your Turn: Creating a Synthetic Transcriptome

72 of 88

Your Turn: Creating a Synthetic Transcriptome

73 of 88

Your Turn: Finding the Right Coordinate System to Describe our Data

74 of 88

Projecting Data Using the Dot Product

75 of 88

Finding the Right Coordinate System to Describe our Data

76 of 88

The Eigenvalues Report on the Spread of the Data Along Each Dimension

77 of 88

The Error in a Reduced Description of our System

78 of 88

The 3D Synthetic Transcriptome

79 of 88

Finding the Natural Coordinates of Our Synthetic Transcriptome

80 of 88

Dimensionality Reduction�Most of the Relevant Information Lives on a Plane

81 of 88

The Error in a Reduced Description of our System

82 of 88

Homework�Adding Downstream Genes

83 of 88

Homework�Adding Noisy, Uncorrelated Genes

84 of 88

A More Common Definition of PCA

85 of 88

Quantifying C. elegans movement and shape

86 of 88

The Eigenworm!

87 of 88

A Simple Synthetic Transcriptome

  • Each gene is driven by a constitutive promoter with low/high expression levels of 15/30 mRNA per cell

88 of 88

Finding Cell Types in the Transcriptome Using k-means Clustering