RNASeq (Model organism)

Advanced tutorial

http://genome.edu.au

Mahtab Mirmomeni & Andrew Lonie

http://vlsci.org.au

Contents

Tutorial Overview

Background [15 min]

Preparation [10 min]

Section 2: Alignment [40 mins]

Section 3. CuffDiff [30 min]

Section 4. Count reads in features with htseq-count  [30 min]

Section 5: Differential gene expression analysis using EdgeR  [30 min]

Section 6. Differential gene expression analysis using DeSeq2 [30 min]

Section 7: How much overlap is there in the 3 differential expression outputs?

Section 8: Biological interpretation using gene set enrichment analysis

References

Tutorial Overview

In this tutorial we compare the performance of three statistically-based expression analysis tools:

  • CuffDiff
  • EdgeR
  • DESeq2

Background [15 min]

Read the background to the workshop here

Where is the data in this tutorial from?

The data for this tutorial is from the paper, A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae, by Nookaew et al. [1] which studies S.cerevisiae strain CEN.PK 113-7D (yeast) under two different metabolic conditions: glucose-excess (batch) or glucose-limited (chemostat).

The RNA-Seq data has been uploaded in NCBI, short read archive (SRA), with accession SRS307298. There are 6 samples in total having two treatments with three biological replicates each. The data is paired-end.

Batch: Batch1 = SRR453566 , Batch2 = SRR453567, Batch3 = SRR453568

Chemostat: Chem1 = SRR453569, Chem2 = SRR453570, Chem3 = SRR453571

We have extracted only chromosome I reads from the samples to make the tutorial of suitable length. This has implications, as discussed in section 8.

Preparation [10 min]

1. Register as a new user in Galaxy if you don’t already have an account (what is Galaxy?)

  1. Open a browser and go to the the Galaxy server: http://galaxy-tut.genome.edu.au
  • NOTE: Firefox/Safari/Chrome all work well, Internet Explorer not so well
  1. Register as a new user: User>Register or login if you already have an account

2. Import the data for the workshop 

You can do this in a few ways, of which by far the easiest is importing from a data library on your Galaxy instance:

  1. Go to Shared Data -> Published Histories and click on RNASeqDGE_ADVNCD_Prep
  • Click 'Import History' at top right, wait for the history to be imported to your account, and then ‘start using this history’.
  • This will create a new Galaxy history in your account with all of the required data files
  • Proceed to Section 1

Alternatively, if you’re using a different Galaxy instance, upload the files directly into your Galaxy using the ‘Get Data > Upload File’ functionality.

  1. Firstly upload the 12 sequence files. ** Make sure you specify that the files are ‘fastqsanger’ format **

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/batch1_chrI_1.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/batch1_chrI_2.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/batch2_chrI_1.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/batch2_chrI_2.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/batch3_chrI_1.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/batch3_chrI_2.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/chem1_chrI_1.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/chem1_chrI_2.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/chem2_chrI_1.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/chem2_chrI_2.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/chem3_chrI_1.fastq

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/chem3_chrI_2.fastq

  1. Then, upload this file of gene definitions, specifying ‘gtf’ format

https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/genes.gtf

OR

Import this History archive (History panel > cog icon in top right > Import from File): https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/Galaxy-History-RNAseqDGE_ADVNCD_Prep.tar.gz

3. You should now have these files in your history:

  1. batch1_chrI_1.fastq
  2. batch1_chrI_2.fastq
  3. batch2_chrI_1.fastq
  4. batch2_chrI_2.fastq
  5. batch3_chrI_1.fastq
  6. batch3_chrI_2.fastq
  7. chem1_chrI_1.fastq
  8. chem1_chrI_2.fastq
  9. chem2_chrI_1.fastq
  10. chem2_chrI_2.fastq
  11. chem3_chrI_1.fastq
  12. chem3_chrI_2.fastq
  13. genes.gtf

Note: The reads are paired end; for example batch1_chrI_1.fastq and batch1_chrI_2.fastq are paired reads from one sequencing run. Low quality reads have already been trimmed.

Completed Galaxy history for this section (in SharedData>Published Histories): RNASeqDGE_ADVNCD_Prep

OR

Import this History archive (History panel > cog icon in top right > Import from File): https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/Galaxy-History-RNAseqDGE_ADVNCD_Prep.tar.gz

Section 2: Alignment [40 mins]

The basic process here is to map each of the individual reads, in the sample, to the reference genome, so that we can later count which reads come from which genes according to some gene model, and then compare gene counts between conditions. In this tutorial, Tophat aligner is used for mapping the RNA-Seq reads to the S. cerevisiae reference genome.

More detailed explanation on Tophat here.

  1. Map/align the reads with Tophat to the S. cerevisiae reference

NGS: RNA Analysis>Tophat2

  1. Is this library mate-paired: Paired-end
  2. RNA-Seq FASTQ file, forward reads: batch1_chrI_1.fastq
  3. RNA-Seq FASTQ file, reverse reads: batch1_chrI_2.fastq
  4. Use a built in reference genome or own from your history: Use a built-in genome
  5. Select a reference genome: saccharomyces cerevisiae sacCer2 
  6. Keep all other defaults
  7. Execute
  1. Five files are generated per sample. The file *-accepted_hits.bam has the read mappings
  1. When complete, rename the accepted-hits file into a more meaningful name - eg 'batch1-accepted_hits.bam'

  1. Repeat for all other fastq datasets

  1. View an aligned BAM file in Trackster
  1. In the history panel for a newly generated BAM file, click on the ‘visualize’ icon.
  2. Select Trackster, View in a new visualization
  3. Browser name: RNAseqDGE
  4. Reference genome build (dbkey): sacCer2
  5. Click ‘Create’
  6. When the trackster window opens, select chrI in the chromosomal region drop down box
  1. Note that the window can take a little time to prepare and download the data
  1. Scroll around and zoom in and out in the trackster genome viewer to get a feel for the data
  1. Add one of the Tophat-generated splice junction files to the visualization
  1. Click on the small ‘+’ icon in the top right of the Trackster window
  2. Select one of the ‘*.splice junctions’ files
  1. Try zooming in on an area with an intron: chrI:148518-153621, or chrI:86985-87795
  2. Ideally we would add a gene model to the visualisation; but the genes.gtf file for S. cerevisae (as downloaded from UCSC Table Browser) has a slightly different naming convention for one of the chromosomes than the reference genome used by Galaxy, which will cause an error to be thrown by Trackster if you try to add it. This is very typical of genomics currently! If you are interested, you can fiddle with the genes.gtf file to rename the chromosome ‘2-micron’ to ‘2micron’, which will fix the problem.
  1. To get back to the analysis view, click on the ‘Analyze Data’ link in the top menu.
  1. If you want both the analysis and visualization, reopen the Analysis in a new browser tab or window (right click on the ‘Analyze Data’ and select the appropriate option)

Completed Galaxy history for this section (in SharedData>Published Histories): RNASeqDGE_ADVNCD_Sec2

Section 3. CuffDiff [30 min]

The aim here is to:

- Generate tables of normalised read counts per gene per sample based on an annotated reference transcriptome

- Statistically test for significant difference in normalised read counts gene-by-gene and transcript-by-transcript, taking into account inter-sample variance

- Assign each gene a probability score that it is genuinely differentially expressed between the two conditions and estimate the fold difference in expression

CuffDiff is part of the Cufflinks pipeline for differential gene expression analysis: Read more about Cuffdiff here

  1. Generate a list of differentially expressed genes using CuffDiff

            NGS: RNA Analysis > Cuffdiff

  1. Transcripts: genes.gtf
  2. Conditions > Condition 1 > Name: batch
  3. Replicates > Replicate 1 > Add Replicate > batch1-accepted_hits.bam
  4. Add new Replicate > Replicate 2 > Add Replicate > batch2-accepted_hits.bam
  5. Add new Replicate > Replicate 3 > Add Replicate > batch3-accepted_hits.bam
  6. Add new group>Condition 2>Name: chem
  7. Replicates > Replicate 1 > Add file -> chem1-accepted_hits.bam
  8. Add new Replicate > Replicate 2 > Add Replicate > chem2-accepted_hits.bam
  9. Add new Replicate > Replicate 3 > Add Replicate > chem3-accepted_hits.bam
  10. Keep rest defaults
  11. Execute

  1. Filter out the significant differentially expressed genes 

           Filter and Sort -> Filter

  1. Filter: ‘Cuffdiff on data ... and others: gene differential expression testing’
  2. With following condition: c14=='yes'
  3. Number of header lines to skip: 1
  4. Execute

      Rename the filter file to a more meaningful name. eg: CuffDiff-SignificantlyExpressedGenes

  1. Check the generated list of differentially expressed genes

There should be ~50 differentially expressed genes in this list. 

Completed Galaxy history for this section (in SharedData>Published Histories): RNASeqDGE_ADVNCD_Sec3

OR

Import this History archive (History panel > cog icon in top right > Import from File): https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/Galaxy-History-RNAseqDGE_ADVNCD_Sec3.tar.gz

Section 4. Count reads in features with htseq-count  [30 min]

Htseq-count creates a count matrix using the number of the reads from each bam file that map to the genomic features in the genes.gtf. For each feature, a gene for example, a count matrix shows how many reads were mapped to this feature.

In order to run this tool in a ‘vanilla’ Galaxy-instance you must ensure the Galaxy repository ‘htseq_bams_to_count_matrix’ (found in the Galaxy test toolshed at http://testtoolshed.g2.bx.psu.edu/view/fubar/htseq_bams_to_count_matrix) is installed. HOWEVER this is not a trivial activity, and you need administrator privileges on the Galaxy instance to do it.

If you are administrator on your instance and are confident of installing new repositories, it should take about 10 minutes.

  1. Convert the BAM file to SAM format*

NGS: SAM Tools>BAM-to-SAM 

  1. BAM File to Convert: batch1-accepted_hits.bam
  2. Include header in output
  3. Execute

Rename the SAM file into a more meaningful name - eg 'batch1: converted sam'

*We convert from bam to sam format because the tool ‘htseq-count’ currently works a little more reliably with sam files.

  1. Repeat for all five other -accepted-hits files, renaming appropriately
  1. Count reads in features with htseq-count

 NGS: RNA Analysis > SAM/BAM to count matrix  

  1. Gene model (GFF) file to count reads over from your current history: genes.gtf
  2. bam/sam file from your history: batch1: converted SAM
  3. Additional bam/sam file from your history: batch2: converted SAM
  4. Specify additional bam/sam file inputs 1 -> Additional bam/sam file from your history: batch3: converted SAM
  5. Specify additional bam/sam file inputs 2 -> Additional bam/sam file from your history: chem1: converted SAM
  6. Specify additional bam/sam file inputs 3 -> Additional bam/sam file from your history: chem2: converted SAM
  7. Specify additional bam/sam file inputs 4 -> Additional bam/sam file from your history: chem3: converted SAM
  8. Keep rest defaults
  9. Execute

We now have a count matrix, with a count against each corresponding sample. We will use this matrix in later sections to calculate the differentially  expressed genes.

Completed Galaxy history for this section (in SharedData>Published Histories): RNASeqDGE_ADVNCD_Sec4

OR

Import this History archive (History panel > cog icon in top right > Import from File): https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/Galaxy-History-RNAseqDGE_ADVNCD_Sec4.tar.gz

Section 5: Differential gene expression analysis using EdgeR  [30 min]

EdgeR is an R package, that is used for analysing differential expressions of RNA-Seq data and can either use exact statistical methods or generalised linear models.

In order to run this tool in a ‘vanilla’ Galaxy-instance you need to make sure the Galaxy repository ‘differential_count_models‘ (http://testtoolshed.g2.bx.psu.edu/view/fubar/differential_count_models) is installed. This can take up to 40 minutes, as it is a very complex set of tools.

Read more about EdgeR here

  1. Generate a list of differentially expressed genes using EdgeR

          NGS RNA analysis -> Differential_Count  

  1. Select an input matrix - rows are contigs, columns are counts for each sample: bams to DGE count matrix_htseqsams2mx.xls
  2. Title for job outputs: Differential Counts-EdgeR
  3. Treatment Name: Batch
  4. Select columns containing treatment.: tick c2: Batch1-sam, c3: Batch2-sam, c4: Batch3-sam
  5. Control Name: Chem
  6. Select columns containing control.: tick c5: Chem1-sam, c6: Chem2-sam, c7: Chem3-sam
  7. Run this model using edgeR
  8. Keep rest defaults
  9. Execute

  1. Filter out the significant differentially expressed genes

           Filter and Sort -> Filter

  1. Filter: DifferentialCounts_topTable_edgeR.xls
  2. With following condition: c6 <= 0.05
  3. Number of header lines to skip: 1
  4. Execute

Rename the filtered file to a more meaningful name. eg EdgeR-SignificantlyExpressedGenes

  1. Check the generated list of differentially expressed genes

There should be ~55 differentially expressed genes in this list

  1. View the generated plots

Click on the eye icone next to DifferentialCountsEdgeR.html

MDS plot

  • An MDS plot shows the two most informative axes of mathematically optimised distance measurements between the samples in transcriptome space. Although there are potentially hundreds of dimensions, the top two often contain about half of all the available statistical information, so usually provide good separation between the samples.
  • For both conditions, the 3 replicates tend to be closer to each other than they are to replicates from the other condition. Note the difference in the scale of x-axis (dimension 1) and y-axis (dimension 2). Biological and technical variation are represented as distances between condition replicates.
  • The higher glucose (‘batch’) condition shows more variability between replicates than the low glucose condition suggesting that in starvation conditions, yeast has a limited range of metabolic options, so the replicate transcriptomes conform to that more uniform, specific pattern, compared to the many physiological options (such as growth and reproduction) available in glucose rich conditions.
  • Conditions separate into two distinct clusters suggesting a substantial experimental effect scaled to relatively small biological/technical variability within each treatment.

edgeR heatmap

  • The heatmap shows the differentially expressed genes as rows, and read counts in each sample as columns.
  • The counts are scaled using each row’s Z-score which determines the colour for each sample for each gene. Z-Score indicates the number of standard deviations from the mean in either a positive or a negative direction.  Samples are clustered and reordered using an hierarchical algorithm.
  • In this example, they separate into treatment and control groups nicely but this does not always happen.
  • Note that the first 6 most differentially expressed genes, ASC1, GDH3, SSA1, YAT1, BDH2 and FUN19 are significantly expressed in Batch samples, but are not expressed in Chem samples. Given that our Batch samples are glucose-excess and our Chem samples are glucose-limited, we can conclude that oversupplying glucose in yeast results in differential expression of these six genes.

Completed Galaxy history for this section (in SharedData>Published Histories): RNASeqDGE_ADVNCD_Sec5

Section 6. Differential gene expression analysis using DeSeq2 [30 min]

DESeq2 is an R package that uses a negative binomial statistical model find significant differences in counts of transcript reads between experimental conditions. It can work without replicates (unlike edgeR) but the author strongly advises against this for reasons of statistical validity.

Read more about DESeq here

  1. Generate a list of differentially expressed genes using DeSeq2

NGS: RNA Analysis ->Differential_Count

  1. Select an input matrix - rows are contigs, columns are counts for each sample: 56: bams to DGE count matrix_htseqsams2mx.xls
  2. Title for job outputs: Differential Counts-Deseq
  3. Treatment Name: Batch
  4. Select columns containing treatment.: tick c2: Batch1-sam, c3: Batch2-sam, c4: Batch3-sam
  5. Control Name: Chem
  6. Select columns containing control.: tick c5: Chem1-sam, c6: Chem2-sam, c7: Chem3-sam
  7. Run this model using edgeR: Do not run EdgeR
  8. Run the same model with DESeq2 and compare findings: Run DESeq2
  9. Keep rest defaults
  10. Execute

     

  1. Filter out the significant differentially expressed genes

               Filter and Sort -> Filter

  1. Filter: 60: DifferentialCounts_topTable_DESeq2.xls
  2. With following condition: c6 <= 0.05
  3. Number of header lines to skip: 1
  4. Execute

                 Rename the filter file to a more meaningful name. eg: Deseq-SignificantlyExpressedGenes

  1. Check the generated list of differentially expressed genes:

There should be ~50 genes in this file. You should see the first few differentially expressed genes in Deseq are similar to the ones identified by EdgeR.

  1. View the generated plots
  1. Click on DifferentialCountsDeseq.html to view the plots

Completed Galaxy history for this section (in SharedData>Published Histories): RNASeqDGE_ADVNCD_Sec6

Section 7: How much overlap is there in the 3 differential expression outputs?

To calculate how much the output of the three differential expression tools we used in this tutorial, cuffdiff, edgeR and DeSeq2, overlap with each other, we can generate a venn-diagram.

  1. Generate a venn-diagram of the output of the 3 differential expression tools.

                Graph/Display Data -> proportional venn

  1. title: CommonSection
  2. input file 1: CuffDiff-SignificantlyExpressedGenes
  3. column index: 2
  4. as name: CuffDiff
  5. input file 2: EdgeR-SignificantlyExpressedGenes
  6. column index file 2: 0
  7. as name: EdgeR
  8. two or three: three
  9. input file 3: Deseq-SignificantlyExpressedGenes
  10. column index file 3: 0
  11. as name file 3: Deseq
  12. Execute

Note: Column index 2 (or c3) contains the gene name in the CuffDiff output. Similarly column index 0 (or c1) in EdgeR and Deseq contains the gene names.

  1. View the generated venn-diagram:

   

Agreement between the tools is good: there are ~50 differentially expressed genes that all three tools agree upon, and only a handful that are exclusive to each tool

  1. Generate the common list of significantly expressed genes identified by the three mentioned tools by extracting the respective gene list columns and intersecting:
  1. Text Manipulation -> cut
  1. cut columns: c3
  2. Delimited by: Tab
  3. From: CuffDiff-SignificantlyExpressedGenes
  4. Execute
  • Rename to something more meaningful eg cuffDiff-Column3
  1. Text Manipulation -> cut
  1. cut columns: c1
  2. Delimited by: Tab
  3. From: EdgeR-SignificantlyExpressedGenes
  4. Execute
  • Rename to something more meaningful eg EdgeR-Column1
  1. Text Manipulation -> cut
  1. cut columns: c1
  2. Delimited by: Tab
  3. From: Deseq-SignificantlyExpressedGenes
  4. Execute
  • Rename to something more meaningful eg Deseq-Column1
  1. Join, Subtract and Group -> Compare two Datasets
  1. Compare: cuffDiff-Column1
  2. against: EdgeR-Column1
  3. keep rest defaults
  4. Execute
  • Rename to something more meaningful: eg Common genes CuffDiff and EdgeR
  1. Join, Subtract and Group -> Compare two Datasets
  1. Compare: Common genes CuffDiff and EdgeR
  2. against: Deseq-Column1
  3. keep rest defaults
  4. Execute
  • Rename to Common genes all methods.

This is the list of the ~50 genes that have been identified as significantly differentially expressed by all three tools.

Completed Galaxy history for this section (in SharedData>Published Histories): RNASeqDGE_ADVNCD_Sec7

Or

Import this History archive (History panel > cog icon in top right > Import from File): https://swift.rc.nectar.org.au:8888/v1/AUTH_a3929895f9e94089ad042c9900e1ee82/RNAseqDGE_ADVNCD/Galaxy-History-RNAseqDGE_ADVNCD_Complete.tar.gz

Section 8: Biological interpretation using gene set enrichment analysis

The biological question being asked in the original paper is essentially: ‘what is the global response of the yeast transcriptome in the shift from growth at glucose excess conditions (batch) to glucose-limited conditions (chemostat)?’

We can address this question by attempting to interpret our differentially expressed gene list at a higher level, perhaps by examining the categories of gene and protein networks that change in response to glucose.

For example, we can input our list of differentially expressed genes to a Gene Ontology(GO) enrichment analysis tool such as GOrilla to find out the GO enriched terms.

NOTE: Because of time-constraints in this tutorial the analysis were confined to a single chromosome (chromosome I); and as a consequence we don’t really have sufficient information to look for groups of differentially expressed genes (simply because we don’t have enough genes identified from the one chromosome to look for statistically convincing overrepresentation of any particular gene group).

For the next step, you need to import another Galaxy history containing the complete analysis of the dataset for this experiment (using all chromosomes). If you use the gene lists from the tutorial you have done above, you won’t be able to identify any statistically significant GO groups.

  1. Import the Galaxy history ‘RNASeq-Advanced_CompleteGenome_Complete-NoInputfiles’**
  1. Step by step histories corresponding to the complete genome analysis are also available if you want to go through the full analysis, but each analysis step takes much longer.
  2. Note that there are ~2500 significantly differentially expressed genes identified in the full analysis
  3. Also note that the genes are ranked in order of statistical significance. This is critical for the next step.

** if you can’t find the RNASeq-Advanced_CompleteGenome_Complete-NoInputfiles history on your Galaxy instance, you can directly import the full differentially expressed gene list here: https://swift.rc.nectar.org.au:8888/v1/AUTH_377/public/RNASeq_ADVNCD/EdgeR-SignificantlyExpressedGenes-columnOne.tabular

  1. Explore the data using gene set enrichment analysis (GSEA) using the online tool GOrilla
  1. Go to http://cbl-gorilla.cs.technion.ac.il/
  2. Choose Organism: Saccharomyces cerevisiae
  3. Choose running mode -> Single ranked list of genes
  4. Paste a ranked list of gene/protein names

From the RNASeq-Advanced_CompleteGenome_Complete-NoInputfiles history:

  1. Click on EdgeR-SignificantlyExpressedGenes-columnOne
  2. Click Ctrl +A to select all the genes
  3. Paste the list into the text box
  1. Choose an Ontology -> Process
  2. Click on Search Enriched GO terms

You will be redirected to a page depicting the GO enriched biological processes and their significance, based on the genes you listed. As an example, small molecule catabolic process and organic acid catabolic process genes have been identified.

       

  1. Experiment with different ontology categories (Function, Component) in GOrilla. 

At this stage you are interpreting the experiment in different ways, potentially discovering information that will lead you to further lab experiments. This is driven by your biological knowledge of the problem space. There are an unlimited number of methods for further interpretation of which GSEA is just one.

References

[1] Nookaew I, Papini M, Pornputtpong N, Scalcinati G, Fagerberg L, Uhlén M, Nielsen J: A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Res 2012, 40 (20):10084 – 10097. doi:10.1093/nar/gks804. Epub 2012 Sep 10