1 of 9

R / DESeq2 Workshop

Applied Computational Genomics, Lecture 23

https://github.com/quinlan-lab/applied-computational-genomics

Aaron Quinlan

Departments of Human Genetics and Biomedical Informatics

USTAR Center for Genetic Discovery

University of Utah

quinlanlab.org

2 of 9

Assumptions

  • You have already aligned your RNA-seq or Chip-seq data
  • You have already created a "counts matrix" from those alignments that summarizes the number of reads aligning to each gene/transcript/peak
  • You have installed R version 3.3 and/or Rstudio

3 of 9

A Brief Introduction to R

https://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf

4 of 9

DESeq2 overview

https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

A basic task in the analysis of count data from RNA-seq is the detection of differentially expressed genes. The count data are presented as a table which reports, for each sample, the number of sequence fragments that have been assigned to each gene. Analogous data also arise for other assay types, including comparative ChIPSeq, HiC, shRNA screening, mass spectrometry. An important analysis question is the quantification and statistical inference of systematic changes between conditions, as compared to within-condition variability. The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models; the estimates of dispersion and logarithmic fold changes incorporate data-driven prior distributions (1). This vignette explains the use of the package and demonstrates typical workflows. An RNA-seq workflow (2) on the Bioconductor website covers similar material to this vignette but at a slower pace, including the generation of count matrices from FASTQ files.

5 of 9

DESeq2 uses "raw" (unnormalized) counts

https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

As input, the DESeq2 package expects count data as obtained, e. g., from RNAseq or another high-throughput sequencing experiment, in the form of a matrix of integer values. The value in the i-th row and the j-th column of the matrix tells how many reads can be assigned to gene i in sample j. Analogously, for other types of assays, the rows of the matrix might correspond e. g. to binding regions (with ChIP-Seq) or peptide sequences (with quantitative mass spectrometry). We will list method for obtaining count matrices in sections below. The values in the matrix should be un-normalized counts of sequencing reads (for single-end RNA-seq) or fragments (for paired-end RNA-seq). The RNA-seq workflow describes multiple techniques for preparing such count matrices. It is important to provide count matrices as input for DESeq2’s statistical model [1] to hold, as only the count values allow assessing the measurement precision correctly. The DESeq2 model internally corrects for library size, so transformed or normalized values such as counts scaled by library size should not be used as input.

6 of 9

featureCounts: making a counts matrix

http://bioinf.wehi.edu.au/featureCounts/

7 of 9

Installing deseq2

https://bioconductor.org/packages/release/bioc/html/DESeq2.html

8 of 9

Deseq2 tutorial

https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

9 of 9

Making "volcano" plots

http://www.gettinggeneticsdone.com/2014/05/r-volcano-plots-to-visualize-rnaseq-microarray.html