1 of 38

Algorithm for Cellular Reprogramming

Scott Ronquist

01/10/2018

2 of 38

Presentation Outline

  • Background Information
    • Cellular reprogramming, transcription factors, Hi-C
  • Algorithm for Cellular Reprogramming
    • Goal, data, algorithm, results, ongoing work/future directions
  • Revisiting Weintraub
    • Background, data, results
  • Fibroblast to Hepatocyte Reprogramming
    • Background, data, results
  • Conclusion

3 of 38

Background�Cellular Reprogramming

  • The ability to convert one cell type into another cell type

Cell A

Cell B

4 of 38

Background�Cellular Reprogramming

  • The ability to convert one cell type into another cell type
  • Discovered in 1989 by Dr. Weintraub
    • Converted human fibroblast to muscle-like cell

Fibroblast

Muscle

Dr. Weintraub

1989

Weintraub, Harold, et al. "Activation of muscle-specific genes in pigment, nerve, fat, liver, and fibroblast cell lines by forced expression of MyoD." PNAS 86.14 (1989): 5434-5438.

5 of 38

Background�Cellular Reprogramming

  • The ability to convert one cell type into another cell type
  • Discovered in 1989 by Dr. Weintraub
    • Converted human fibroblast to muscle-like cell
  • Fibroblast to embryonic stem cell (ESC)-like state discovered in 2007 by Dr. Yamanaka
    • induced Pluripotent Stem Cell (iPSC)
    • Nobel Prize 2012

Fibroblast

Embryonic

Stem Cell

Muscle

Dr. Yamanaka

2007

Dr. Weintraub

1989

Takahashi, Kazutoshi, et al. "Induction of pluripotent stem cells from adult human fibroblasts by defined factors." cell 131.5 (2007): 861-872.

6 of 38

Background�Transcription Factors

  • Transcription factors (TF) are proteins that bind to DNA
  • TFs are often used to mediate reprogramming
    • MYOD1 for fibroblast to muscle
    • OCT4, SOX2, KLF4, and MYC for fibroblast to iPSC
  • TFs bind in specific sequences in promoter and enhancer regions to control gene expression

Fibroblast

Embryonic

Stem Cell

Muscle

Dr. Yamanaka

2007

Dr. Weintraub

1989

MYOD1

OCT4, SOX2

KLF4, MYC

7 of 38

Background�Hi-C

  • Genome-wide Chromosome Conformation Capture (Hi-C)
  • Hi-C obtains information on genome structure

chromatin

gene

A

B

C

D

E

Lieberman-Aiden, Erez, et al. "Comprehensive mapping of long-range interactions reveals folding principles of the human genome." science 326.5950 (2009): 289-293.

8 of 38

Background�Hi-C

  • Genome-wide Chromosome Conformation Capture (Hi-C)
  • Hi-C obtains information on genome structure
  • Counts the number of times genomic loci contact in 3D space

chromatin

gene

A

B

C

D

E

D

E

C

B

A

A

B

C

D

E

no contact

contact

Hi-C

9 of 38

Background�Hi-C

  • Genome-wide Chromosome Conformation Capture (Hi-C)
  • Hi-C obtains information on genome structure
  • Counts the number of times genomic loci contact in 3D space
  • Hi-C captures both intra- and inter- chromosomal interactions
  • Typically obtained from bulk samples

Intra-chromosome

Inter-chromosome

Hi-C contact map – genome-wide

10 of 38

Algorithm for Cellular Reprogramming�Goal

  • Problem
    • Cellular reprogramming experiments are time consuming and costly

Fibroblast

ESC

Muscle

MYOD1

OCT4, SOX2

KLF4, MYC

11 of 38

Algorithm for Cellular Reprogramming�Goal

  • Problem
    • Cellular reprogramming experiments are time consuming and costly
  • Goal
    • Predict TFs that can be used for cellular reprogramming

Fibroblast

ESC

Muscle

MYOD1

OCT4, SOX2

KLF4, MYC

β Cell

Hepatocyte

TF A

TF B

12 of 38

Algorithm for Cellular Reprogramming�Goal

  • Problem
    • Cellular reprogramming experiments are time consuming and costly
  • Goal
    • Predict TFs that can be used for cellular reprogramming
    • Any cell to any cell

Fibroblast

ESC

Muscle

MYOD1

OCT4, SOX2

KLF4, MYC

β Cell

Hepatocyte

TF A

TF B

TF C

13 of 38

Algorithm for Cellular Reprogramming�Goal

  • Problem
    • Cellular reprogramming experiments are time consuming and costly
  • Goal
    • Predict TFs that can be used for cellular reprogramming
    • Any cell to any cell
  • Application
    • Tissue regeneration and disease modelling
    • Cancer treatment

Fibroblast

ESC

Muscle

MYOD1

OCT4, SOX2

KLF4, MYC

β Cell

Hepatocyte

TF A

TF B

TF C

Cancerous

Non-cancerous

TF D

14 of 38

Algorithm for Cellular Reprogramming�Data – natural dynamics

0h

synchronization

8h

56h

  • Time series Hi-C and RNA-seq
  • Proliferating human fibroblasts
  • Cell cycle and circadian rhythm synchronized
    • Serum starvation and dexamethasone
  • 8 time points, 8 hours apart

15 of 38

Algorithm for Cellular Reprogramming�Data – target states

  • Gene expression on the target states is needed to determine which genes need to be controlled
  • GTEx provides tissue expression for 56 tissues

16 of 38

Algorithm for Cellular Reprogramming�Control Theory

  • Formally introduced in 1868 by James Clerk Maxwell
  • System: group of working parts that we wish to control
    • Cell/ genome
  • Controller: the ability to change the system
    • TFs
  • Sensor: measurements obtained on the system
    • RNA-seq, Hi-C

System

Controller

Sensor

Output

17 of 38

Algorithm for Cellular Reprogramming�Control Theory

  • Formally introduced in 1868 by James Clerk Maxwell
  • System: group of working parts that we wish to control
    • Cell/ genome
  • Controller: the ability to change the system
    • TFs
  • Sensor: measurements obtained on the system
    • RNA-seq, Hi-C

System

(Cell)

Controller

(TFs)

Sensor

(Hi-C, RNA-seq)

No continuous feedback in our system

We can’t live cell monitor RNA-seq

Output

(New Cell State)

18 of 38

Algorithm for Cellular Reprogramming�Control Theory

  • Control equation for our biological system

 

Rajapakse, Indika, Mark Groudine, and Mehran Mesbahi. "Dynamics and control of state-dependent networks for probing genomic organization." PNAS (2011).

 

Discrete model

19 of 38

Algorithm for Cellular Reprogramming�Control Theory

  •  

 

20 of 38

Algorithm for Cellular Reprogramming�Control Theory - example

  •  

 

Gene A expression

Gene B expression

 

Natural dynamics

no control input

 

21 of 38

Algorithm for Cellular Reprogramming�Control Theory - example

  •  

Gene B expression

 

 

Natural dynamics

no control input

 

Gene A expression

 

 

 

22 of 38

Algorithm for Cellular Reprogramming�Control Theory - example

  •  

Gene B expression

 

 

 

Natural dynamics

no control input

 

Gene A expression

 

 

 

 

23 of 38

Algorithm for Cellular Reprogramming�Control Theory - example

 

Gene B expression

 

 

Gene A expression

 

 

24 of 38

Algorithm for Cellular Reprogramming�Control Theory - example

 

Gene B expression

 

 

 

 

Gene A expression

 

 

 

25 of 38

Algorithm for Cellular Reprogramming�Control Theory - example

  •  

Gene B expression

 

 

 

 

 

Gene A expression

 

 

 

 

 

26 of 38

Algorithm for Cellular Reprogramming�Control Theory - example

  •  

Gene B expression

 

 

 

 

 

 

 

Gene A expression

Target state

 

 

 

Time of input is important!

27 of 38

Algorithm for Cellular Reprogramming�State Representation

  • We use RNA-seq to represent the state of our system (cell) at each time point
  • To reduce model complexity, we use topologically associating domains (TADs) to group gene expression together

chromatin

gene

A

B

C

D

E

Time

genes

Time

TADs

TAD

Biologically

inspired

dimension

reduction

Dixon, Jesse R., et al. "Topological domains in mammalian genomes identified by analysis of chromatin interactions." Nature 485.7398 (2012): 376.

28 of 38

Algorithm for Cellular Reprogramming�State Representation

  • TADs are determined from Hi-C
  • Genes within TADs show correlated expression over time

Chen, Jie, Alfred O. Hero III, and Indika Rajapakse. "Spectral identification of topological domains." Bioinformatics 32.14 (2016): 2151-2158.

Chen, H., Chen, J., Muir, L. A., Ronquist, S., et al. "Functional organization of the human 4D Nucleome." Proceedings of the National Academy of Sciences 112.26 (2015): 8002-8007.

29 of 38

Algorithm for Cellular Reprogramming�State Representation

  • TADs are determined from Hi-C
  • Genes within TADs show correlated expression over time
  • Iterative partitioning of the of chromatin into nodal domains via the fiedler vector can determine TAD structure
    • Fiedler vector partitions a graph into 2 sub-graphs
    • Sub-graphs with poor connectivity are divided further

Chen, Jie, Alfred O. Hero III, and Indika Rajapakse. "Spectral identification of topological domains." Bioinformatics 32.14 (2016): 2151-2158.

Chen, H., Chen, J., Muir, L. A., Ronquist, S., et al. "Functional organization of the human 4D Nucleome." Proceedings of the National Academy of Sciences 112.26 (2015): 8002-8007.

30 of 38

Algorithm for Cellular Reprogramming�A matrix – calculation

 

 

Brockett, Roger W. "Finite dimensional linear systems." (1970).

31 of 38

Algorithm for Cellular Reprogramming�B matrix – TF binding sites

  • Represents the influence of each TF to each gene
  • We approximate this by finding TF binding sites (TFBS) near a gene’s transcription start site (TSS)
  • TFBS are genomic locations where TFs prefer to bind
  • We scan the reference genome to determine TFBS locations for 547 TFs
  • TFBS downloaded from motifDB

MYOD1

TFBS

SOX2

TFBS

Gene A

TSS

A

Gene B

TSS

B

Grant, Charles E., Timothy L. Bailey, and William Stafford Noble. "FIMO: scanning for occurrences of a given motif." Bioinformatics 27.7 (2011): 1017-1018.

Shannon, P., and M. Richards. "MotifDb: An annotated collection of protein-DNA binding sequence motifs." R package version 1.0 (2014).

32 of 38

Algorithm for Cellular Reprogramming�B matrix – TF function and chromatin accessibility

  • Literature search used to determine TF function
    • Either activation or repression
  • DNAse-seq determines chromatin accessibility genome-wide
  • We use publicly available DNAse-seq to determine whether TF can bind to target

Expression

Gene A

Expression

Gene A

Expression

Gene A

Expression

Gene A

Inaccessible TFBS

Repressor

Activator

Multiple TFBS

TF

TF

TF

TFs

33 of 38

Algorithm for Cellular Reprogramming�B matrix – TF function and chromatin accessibility

  • Literature search used to determine TF function
    • Either activation or repression
  • DNAse-seq determines chromatin accessibility genome-wide
  • We use publicly available DNAse-seq to determine whether TF can bind to target

8

5

2

3

TF

547

TAD

2,245

 

Chromatin accessibility

TF function

# of TF binding sites

B

matrix

 

 

34 of 38

Algorithm for Cellular Reprogramming�B matrix – TF function and chromatin accessibility

  • Literature search used to determine TF function
    • Either activation or repression
  • DNAse-seq determines chromatin accessibility genome-wide
  • We use publicly available DNAse-seq to determine whether TF can bind to target

8

-5

2

-3

TF

547

TAD

2,245

 

Chromatin accessibility

TF function

# of TF binding sites

B

matrix

 

 

repressor

35 of 38

Algorithm for Cellular Reprogramming�B matrix – TF function and chromatin accessibility

  • Literature search used to determine TF function
    • Either activation or repression
  • DNAse-seq determines chromatin accessibility genome-wide
  • We use publicly available DNAse-seq to determine whether TF can bind to target

8

-5

0

0

TF

547

TAD

2,245

 

Chromatin accessibility

TF function

# of TF binding sites

B

matrix

 

 

inaccessible

36 of 38

Algorithm for Cellular Reprogramming�TF Scoring

  • Out of all possible combinations, amounts, and timing of TF input, what control gets the system as close as possible to the target state?
    • Combinations up to 4 considered
    • TFs must be overexpressed in the target cell type

Gene B expression

 

 

 

 

Gene A expression

Target state

 

 

 

37 of 38

Algorithm for Cellular Reprogramming�Results

  • Algorithm predicts TFs known to reprogram cell
    • MYOD1 is ranked 2nd for fibroblast to myotube
    • MYCN, NANOG, and POU5F1 are the top combination for fibroblast to ESC
  • Our dynamic model allows for determination of optimal time of input

38 of 38

Algorithm for Cellular Reprogramming�Ongoing Work and Future Directions

  • Additional targets added from FANTOM5
    • Over 2,000 samples from over 200 cell types
  • Additional TFs added from HumanTF
    • 547 TFs to 1,200
  • chromHMM used to determine gene promoter regions
  • STRING DB used to inform state transition matrix and TF function
    • Use known gene regulatory networks to inform our state transition matrix
  • No dimension reduction
  • This work is part of iReprogram, LLC

Lizio, Marina, et al. "Gateways to the FANTOM5 promoter level mammalian expression atlas." Genome biology 16.1 (2015): 22.

Lambert, Samuel A., et al. "The human transcription factors." Cell 172.4 (2018): 650-665.

Ernst, Jason, and Manolis Kellis. "Chromatin-state discovery and genome annotation with ChromHMM." nature protocols12.12 (2017): 2478.

Szklarczyk, Damian, et al. "The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible." Nucleic acids research (2016): gkw937.