1 of 23

RNA-Seq data, part II

CSE 180

Yana Safonova

2 of 23

RNA-seq technologies

3 of 23

RNA-seq pipelines

A survey of best practices for RNA-seq data analysis, review, Gen Biol

4 of 23

Finding isoforms using genome mapping

5 of 23

Coverage-based finding isoforms

Isoforms Coverage

ACD 10

ACE 100

BCD 5

BCE 5

6 of 23

Coverage-based finding isoforms

Isoforms Coverage

ACD 10

ACE 100

BCD 5

BCE 5

5 + 5

10 + 100 + 5 + 5

10 + 100

5 + 100

5 + 10

7 of 23

Coverage-based finding isoforms

Isoforms

?

?

?

?

5 + 5

10 + 100 + 5 + 5

10 + 100

5 + 100

5 + 10

8 of 23

Splice graph

Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs

9 of 23

De novo transcriptome assembly

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

10 of 23

Differential expression of isoforms / genes

Hepatitis C-associated mixed cryoglobulinemic vasculitis induces differential gene expression in peripheral mononuclear cells

11 of 23

Cell sorting and differential expression of genes

Ellebedy et al, Nat Immunol, 2016 detected a new type of cells using differential expression

12 of 23

Single cell RNA-seq (scRNA-seq)

droplet barcode

molecular barcode

primers

Barcoded RNAs

13 of 23

Droplet barcoding

14 of 23

scRNA-seq visualization

Sample1

Sample2

Sample3

Sample4

Sample4

Gene1

...

...

...

...

...

Gene2

...

...

...

...

...

Gene3

...

...

...

...

...

15 of 23

PCA

Principal Component Analysis transforms N-dimensional data (that are hard to visualize) to 2-dimensional data preserving “natural clusters” of original points

Bad choice of features

Good choice of features

16 of 23

PCA example: amino acid properties

From 23 features to 11 features

1

2

3

4

5

6

7

8

9

10

11

17 of 23

10x Cell Ranger

Example cell clustering based differential expression of PBMC from a healthy donor (method t-SNE):

http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_web_summary.html

18 of 23

Gene networks

19 of 23

Proteomics

  • Quantitative proteomics
  • Protein sequencing
  • Drug design

20 of 23

Proteogenomics

Methods, Tools and Current Perspectives in Proteogenomics

21 of 23

Proteogenomics

CNA - copy number aberrations

eQTL - expression quantitative trait loci

PTM - post translational modification

Methods, Tools and Current Perspectives in Proteogenomics

22 of 23

Correlation between RNA levels and number of proteins

Gene‐specific correlation of RNA and protein levels in human cells and tissues

23 of 23

Proteogenomics example: colorectal cancer

  • Each row represents a sample.
  • Red and blue colors represent over- and under-expression, respectively

Methods, Tools and Current Perspectives in Proteogenomics