June 6, 2025
Bulk RNA-Seq Analysis
MSM-NYGC Sequence Informatics Workshop
Heather Geiger, Rui Fu
@nygenome
@notsojunkdna
Outline
RNA-Seq
Typical RNA-Seq process
Griffith et al, 2015
Fragmentation
RNA-seq library fragmentation and size selection
Griffith et al, 2015
Isolation of RNA
RNASeq alignment
“gene”
“a read”
5’UTR
exon 2
3’UTR
intron
intron
exon 1
exon 3
R1
R2
RNASeq alignment
“many reads”
RNASeq alignment
many reads corresponding to the different insert size
RNASeq workflow
“a read”
“Align”
RNASeq workflow
“a read”
RNASeq workflow
“a read”
RNASeq workflow
“many reads”
Split alignment
however…
Genes don’t exist!
“gene”
“transcripts” (or isoforms)
1
2
3
These 5 reads support isoform 1
1
2
3
support isoform 3
support isoform 2
doesn’t only support isoform 2
support isoform 1
Quantification
featureCounts
What do we do with counts?
Counts/RPKM/TPM
Unique reads
Duplicated reads
Alternative pipeline
Kallisto
Differential expression
Why do we need replicates?
With replicates
Complex design
Batch effect
Alternative splicing
How many reads? Read length
Fusion gene discovery
Fusion genes
Immune cell decomposition
“tumor RNASeq” is the sequencing of RNA molecules present in a sample composed of tumor cells, stromal cells and normal cells.
The aggregated expression profile can be decomposed to estimate the proportion of each cell type in the sample.
Immuno-oncology, neoantigen
Variant calling?
RNA editing
SNPiR: Piskol et al. AJHG 2013
Novel transcripts
Annotation
Long read technologies
Do and don’t
Want more?