1 of 31

RNA-Seq data

2 of 31

RNA-seq technologies

3 of 31

3 pipelines for RNA-seq analysis

A survey of best practices for RNA-seq data analysis, review, Gen Biol

4 of 31

TOPHAT: genome mapping

5 of 31

Running TOPHAT

  • Required files
    • Reference genome
    • RNA-seq data files

  • Optional files
    • Annotation file (GFF3 or GTF)

  • Output files
    • A BAM file per sample
    • Alignment statistics

6 of 31

Mapping visualization - IGV

7 of 31

Tools are constantly evolving

TPR = TP / (TP + FN), FPR = FP / (FP + TN)

8 of 31

Finding isoforms using genome mapping

9 of 31

Coverage-based finding isoforms

Isoforms Coverage

ACD 10

ACE 100

BCD 5

BCE 5

10 of 31

Coverage-based finding isoforms

Isoforms Coverage

ACD 10

ACE 100

BCD 5

BCE 5

5 + 5

10 + 100 + 5 + 5

10 + 100

5 + 100

5 + 10

11 of 31

Coverage-based finding isoforms

Isoforms

?

?

?

?

5 + 5

10 + 100 + 5 + 5

10 + 100

5 + 100

5 + 10

12 of 31

Splice graph

Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs

13 of 31

De novo transcriptome assembly

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

14 of 31

Differential expression of isoforms / genes

Hepatitis C-associated mixed cryoglobulinemic vasculitis induces differential gene expression in peripheral mononuclear cells

15 of 31

DGE tools

  • edgeR (Robinson et al., 2010)
  • DESeq/DESeq2 (Anders and Huber, 2010; Love et al., 2014)
  • limma-voom (Ritchie et al., 2015)
  • Rapaport et al. (2013); Soneson and

Delorenzi (2013); Schurch et al. (2015): reviews of DGE tools

16 of 31

DGE methods - I

  • Align RNA-seq reads to genes
  • Identify expressed genes
  • Count reads for each expressed genes

17 of 31

DGE methods - II

  • For each identified gene, test the null hypothesis: there is no systematic difference between the average read count values:
    • t-test: How dissimilar are the means of two populations?
    • ANOVA: How good does a reduced model capture the data when compared to the full model with all coefficients?

  • Use heatmap and clustering (e.g., hierarchical clustering) for visualization

18 of 31

DGE example

19 of 31

Cell sorting and differential expression of genes

Ellebedy et al, Nat Immunol, 2016 detected a new type of cells using differential expression

20 of 31

Single cell RNA-seq (scRNA-seq)

droplet barcode

molecular barcode

primers

Barcoded RNAs

21 of 31

Droplet barcoding

22 of 31

scRNA-seq visualization

Sample1

Sample2

Sample3

Sample4

Sample4

Gene1

...

...

...

...

...

Gene2

...

...

...

...

...

Gene3

...

...

...

...

...

23 of 31

PCA

Principal Component Analysis transforms N-dimensional data (that are hard to visualize) to 2-dimensional data preserving “natural clusters” of original points

Bad choice of features

Good choice of features

24 of 31

PCA example: amino acid properties

From 23 features to 11 features

1

2

3

4

5

6

7

8

9

10

11

25 of 31

10x Cell Ranger

Example cell clustering based differential expression of PBMC from a healthy donor (method t-SNE):

http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_web_summary.html

26 of 31

Gene networks

27 of 31

Proteomics

  • Quantitative proteomics
  • Protein sequencing
  • Drug design

28 of 31

Proteogenomics

Methods, Tools and Current Perspectives in Proteogenomics

29 of 31

Proteogenomics

CNA - copy number aberrations

eQTL - expression quantitative trait loci

PTM - post translational modification

Methods, Tools and Current Perspectives in Proteogenomics

30 of 31

Correlation between RNA levels and number of proteins

Gene‐specific correlation of RNA and protein levels in human cells and tissues

31 of 31

Proteogenomics example: colorectal cancer

  • Each row represents a sample.
  • Red and blue colors represent over- and under-expression, respectively

Methods, Tools and Current Perspectives in Proteogenomics