Frequently Asked Questions
Training session: Reference-based RNA-seq with Galaxy
Presentor: Fotis Psomopoulos & Bérénice Batut
-----
Q: The audio in the video seems off
A: We fixed the issue, but your browser may still be showing you the old one. Please use this video: https://www.youtube.com/watch?v=j4onRSN650A&feature=youtu.be
Q: My jobs are not running / I cannot see the history overview menu / I cannot name my history
A: Please make sure you are logged in. At the top menu bar, you should see a section labeled “User”. If you see “Login/Register” here you are not logged in.
Q: I am using the RNA STAR tool to map my reads on the reference genome. while filling in the details for , I have to fill in "Length of the genomic sequence around annotated junctions", which apparently has to be 36. I'm lost for a moment why this is 36, what is meant by it, and why is it relevant? Does anyone have any ideas? Why should it be the length of the reads -1?
A: RNA STAR is using the gene model to create the database of splice junctions, and that these don't "need" to have a length longer than the reads (37bp).
Q: Infer experiment, is it ever used in practice? I mean, most often you are aware if the RNA-seq data is stranded or not in the first place, right, because you sequenced it yourself or ordered it from a company
A: This can happen in cases where you get the data from someone else, and they don't know.
Q: Is it possible to visualize the RNA STAR bam file using the JBrowse tool?
A: Yes, that should work
Q: I am trying to check the strandedness of my libraries and I get unequal numbers in the infer experiments, but in the IGV it looks like it is unstranded. What does this mean?
A: It’s also often the case that elimination of the second strand is not perfect, and there are genuine cases of bidirectional transcription in the genome. 70 / 30 % as in your report is not a good result for a stranded library. You can treat this as a stranded library in your analysis, but for instance you couldn’t make the conclusion that a given gene is actually transcribed from the reverse strand. Likely that the library preparation didn’t work perfectly. This can depend on many factors, one is that you need to completely digest your DNA using a high quality DNase before doing the reverse transcription
Q: Regarding DESeq2, in the tutorial you used the normalised count table. Some people use VST normalised counts or rlog normalised counts for visualisation (heatmaps), would you recommend it ? And second question, regarding the heatmap2, I think this depends on the data you analyse but do you have any advise on how to select the clustering method and the distance method?
A: this depends on what you would like to do with the table. The DESeq2 wrapper in Galaxy can output all of these, and there is a nice discussion in the DESeq2 vignette about this topic: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#count-data-transformations
Q: I have RNAseq data for pilot experiment for differential expression in host associated bacterium. One dataset is obtained from bacterial culture, but the other comes from bacteria obtained from the host (plant). I expect strong contamination of the second sample with host RNA reads. Should I filter out reads from the host before performing the analysis (if so, what tools I could use for that), or could I just ignore the contamination (since I will use the bacterial genome to map the reads, it will disregard any host associated reads)?
A: You could map both sets of reads to the reference genome. You are right - no pre-filtering required as the host reads shouldn't map to the ref and will be excluded
Q: I am actually treating the single-end reads. I have a question concerning featureCounts.
In the tutorial:
featureCounts tool to count the number of reads per gene:
For single-end reads, should we enable 'fragments counting instead of reads' or use default parameter (disable)?
A: