1 of 15

Day 1 – RNA-seq library prep and Trimming

Presented by Beant Kapoor

08-14-2023

2 of 15

Why is RNA seq done?

  • To calculate gene expression
  • Protein coding transcripts or messenger RNA (mRNA)
    • mRNA - 1-5% of total RNA
    • rRNA - 80-90%

3 of 15

RNA-seq library prep

  1. Depletion or Enrichment
  2. Conversion of RNA to cDNA
  3. Add sequencing adapters

TruSeq and SMART-seq library preps

4 of 15

RNA enrichment

mRNA

PolyA capture

rRNA depletion

5 of 15

TruSeq RNA sample prep

Enriched RNA

Fragment and prime

Reverse transcription

Second strand synthesis

6 of 15

TruSeq RNA sample prep

7 of 15

SMART-seq sample prep

Reverse transcription

Template switch

cDNA PCR

DNA library preparation

8 of 15

Trimming

  • Removing low quality data or portions of sequence data might interfere with our analyses
  • Why trim?
    • Erroneous sequencing calls (low Q scores)
    • Read contains synthetic oligos from library preparation

9 of 15

Example Fastq file

10 of 15

How is a Q-score calculated?

Q = -10log10P

Let’s say P = 0.01

1 in 100 probability, base is miscalled

Q = -10log10(0.01)

Q = -10 x -2

Q = 20

11 of 15

How is a Q-score calculated?

Q = -10log10P

Let’s say P = 0.001

1 in 1,000 probability, base is miscalled

Q = -10log10(0.001)

Q = -10 x -3

Q = 30

12 of 15

How is a Q-score calculated?

illumina.com

13 of 15

How is a Q-score calculated?

illumina.com

RTA2

RTA3

14 of 15

How reliable is a Q-score?

illumina.com

15 of 15

Summary - when to trim?

  • When you know your data has adapter contamination
  • If you’re performing genome sequencing or SNP-calling
  • If sequence data, on average, have low-quality scores
  • If you have many reads to spare