How to deal with your RNA-seq data ?
Emilie Drouineau, Rachel Legendre & the RNA-seq team
École de Bioinformatique AVIESAN-IFB-Inserm 2022
1 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Summary
Quality control, Mapping, Counting
01
Bioinformatics
Experimental design, Exploratory data analysis
02
Statistics
Normalization, modelisation and troubleshooting
03
Statistics
Gene Sets Analysis methods
05
Advanced practice
Differential analysis with SARTools
04
Practice
Transcriptome de novo assembly
06
Bioinformatics
2 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Bioinformatics
Introduction and prerequisites
3 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Raw NGS data
Instrument
Flowcell
Intensities
4 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Data storage: Hiseq2500
5 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Data storage: NovaSeq6000
6 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
RNA-seq applications
« Transcriptome analysis provides information about the identity and quantity of all RNA molecules in one cell or a population of cells »
7 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
RNA-seq: Why ? How
Ask right question before libraries preparation and sequencing:
Prokaryotes
I don’t find a ribo-depletion kit for my organism:
I want to identify antisense RNA:
I’m interested in transposons:
Eukaryotes
I want coding genes only:
I want non-coding genes also:
I’m interesting in small RNA profiling:
I’m interesting in isoforms:
8 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
RNA-seq: Why ? How
Regardless of your organism:
9 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
RNA-seq: Why ? How
Regardless of your organism:
For a successful experiment, it's imperative to include bioinformaticians and biostatistician before the beginning of the RNA extraction
10 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Prerequisites
RNA sample:
Reference genome:
Complete genomic sequence in fasta format
Annotation file:
All features (genes, CDS, intron, UTR) of genome in GFF/GTF format
11 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Where find the genome and the annotation ?
Common databases Specific databases
12 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Keep control on your datas
13 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
FASTQC: explore quality scores
14 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
FASTQC: explore quality scores
15 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
FASTQC: explore quality scores
Systematic high duplication level in RNA-seq, why ?
16 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
How to screen contaminations ?
Different levels:
17 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
How to screen contaminations ?
Different levels:
18 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
How to screen contaminations ?
Different levels:
As soon as you visualise your reads against an annotated genome the presence of DNA is normally fairly apparent as a consistent background of reads over the whole genome
19 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Bioinformatics
From mapping to counting
20 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
RNA-seq mapping specificity
Cole Trapnell & Steven L Salzberg.Nature Biotechnology 27, 455 - 457 (2009)
21 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Mapping timeline
From https://www.ebi.ac.uk/~nf/hts_mappers/
22 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Choose the good mapper
Which one is the best mapper ?
23 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Choose the good mapper
Which one is the best mapper ?
Which mapper should I use based on my data and my analysis ?
24 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Choose the good mapper
Depends on:
- Detection of splicing events STAR, minimap2, Hisat2
- Length of reads:
Very short read (<50) : Bowtie1
Up to 1000kb : BWA-SW, bowtie2
Long reads : Minimap2
- Allow gap on alignment STAR, BWA, Bowtie2
Common situations: choose a mapper widely-used and well maintained
25 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Known biases in RNA-seq
Intron coverage: if many reads align to introns, this is indicative of incomplete poly(A) enrichment or abundant presence of immature transcripts.
Intergenic reads: if a significant portion of reads is aligned outside of annotated gene sequences, this may suggest genomic DNA contamination (or abundant non-coding transcripts).
3' bias: over-representation of 3' portions of transcripts indicates RNA degradation.
26 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Mapping QC on RNA-seq
27 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Mapping QC on RNA-seq
28 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Quantify number of reads on each gene
When counting reads, make sure you know how the program handles the following:
Two popular tools :
29 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Quantify number of reads on each gene
Deschamps-Francoeur, et al. 2020. doi:10.1016/j.csbj.2020.06.014
30 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
RNA-seq experiment
Organism: Arabidopsis thaliana, plant and model organism.
Genome and annotation available in TAIR10, the arabidopsis database
Dataset: 3 biological replicates, paired-end sequencing.
Characterization of the function of the protein arginine methyltransferase AtPRMT5 during de novo shoot regeneration in Arabidopsis by a knocking-out of AtPRMT5.
31 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Practice
cd /shared/projects/<PROJECT>
mkdir TP_rnaseq
cp /shared/projects/form_2022_32/coursLinux/rna-seq/01-Bioinfo/runme.sh TP_rnaseq
32 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Pipeline
33 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Bioinformatics
Visualize your data
34 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Visualize alignments
Which format ?
Which tools ?
35 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022
Visualize alignments
Go to AT4G31120
36 | Emilie Drouineau | Bioinformatics | 15/11/2022
EBAII niv 1 2022