ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Speciedbkeybowtie-indexbowtie2bwa_indexliftoverrsem.2bit.fa.fa.faigtfgff
2
arabidopsis thaliana at9yesyesyesyesyesyesyesyes
3
201620162016201602/01/201802/01/2018
4
from ensembl
ftp://ftp.ensemblgenom
es.org/pub/release-38/
plants/gtf/arabidopsis_
thaliana/
from
TAIR portal https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes
5
at10yesyesyesyesyesyesyesyes
6
20162016201602/01/201802/01/2018
7
from Federico (2016)from Federico (2016)from ensembl
ftp://ftp.ensemblgenom
es.org/pub/release-38/
plants/gtf/arabidopsis_
thaliana/
from
TAIR portal https://www.arabidopsis.org/download/index-auto.jsp?dir=%2Fdownload_files%2FGenes
8
drosophila melanogasterdm2yesno
9
2016
10
dm3yesyesyesyesyesyesyesyesyesno
11
201620162016201611/03/201820162016201611/03/2018
12
1 Mar 2018 (mtangaro) - Added dm3ToDm6.over.chain from http://hgdownload.soe.ucsc.edu/goldenPath/dm3/liftOver/rsem version: 1.3.0
command: rsem-prepare-reference --gtf (.gtf) --transcript-to-gene-map (table.txt) --bowtie (.fa) dm6
.fa download date:11/03/2018
.fa download url: from ucsc http://hgdownload.cse.ucsc.edu/goldenPath/ for assembly that does not have
all the genome in fasta format were downloaded the .2bit format and converted in .fa using the ucsc tool fatotwobit
link of the program https://genome.ucsc.edu/goldenpath/help/twoBit.html
gtf download date: 11/03/2018
gtf download link: ucsc tablebrowser
table download date 11/03/2018
table download link : https://genome.ucsc.edu/cgi-bin/hgTables
Select Fields from "genome".refGene--> name
genome.refFlat fields -->geneName
Linked Tables--> refFlat
the table and the .gtf must be downloaded the same day

from ucsc table
browser https://genom
e.ucsc.edu/cgi-bin/hg
Tables
13
Homo sapienshg18yesyesyesyesyesyesyesyesyesno
14
201620162016201611/03/2018
15
rsem version: 1.3.0
command: rsem-prepare-reference --gtf (.gtf) --transcript-to-gene-map (table.txt) --bowtie (.fa) dm6
.fa download date:11/03/2018
.fa download url: from ucsc http://hgdownload.cse.ucsc.edu/goldenPath/ for assembly that does not have
all the genome in fasta format were downloaded the .2bit format and converted in .fa using the ucsc tool fatotwobit
link of the program https://genome.ucsc.edu/goldenpath/help/twoBit.html
gtf download date: 11/03/2018
gtf download link: ucsc tablebrowser
table download date 11/03/2018
table download link : https://genome.ucsc.edu/cgi-bin/hgTables
Select Fields from "genome".refGene--> name
genome.refFlat fields -->geneName
Linked Tables--> refFlat
the table and the .gtf must be downloaded the same day
from Federico (2016)from Federico (2016)from Federico (2016)
from ucsc table
browser https://genom
e.ucsc.edu/cgi-bin/hg
Tables
16
hg19yesyesyesyesyesyesyesyesyesno
17
201620162016201611/03/2018
18
rsem version: 1.3.0
command: rsem-prepare-reference --gtf (.gtf) --transcript-to-gene-map (table.txt) --bowtie (.fa) dm6
.fa download date:11/03/2018
.fa download url: from ucsc http://hgdownload.cse.ucsc.edu/goldenPath/ for assembly that does not have
all the genome in fasta format were downloaded the .2bit format and converted in .fa using the ucsc tool fatotwobit
link of the program https://genome.ucsc.edu/goldenpath/help/twoBit.html
gtf download date: 11/03/2018
gtf download link: ucsc tablebrowser
table download date 11/03/2018
table download link : https://genome.ucsc.edu/cgi-bin/hgTables
Select Fields from "genome".refGene--> name
genome.refFlat fields -->geneName
Linked Tables--> refFlat
the table and the .gtf must be downloaded the same day
from Federico (2016)from Federico (2016)from Federico (2016)
from ucsc table
browser https://genom
e.ucsc.edu/cgi-bin/hg
Tables
19
hg38yesyesyesyesyesyesyesyesyesyes
20
from Federico (2016)from Federico (2016)from Federico (2016)from Federico (2016)11/03/201820162016from Federico (2016)11/03/201814/01/2018
21
rsem version: 1.3.0
command: rsem-prepare-reference --gtf (.gtf) --transcript-to-gene-map (table.txt) --bowtie (.fa) dm6
.fa download date:11/03/2018
.fa download url: from ucsc http://hgdownload.cse.ucsc.edu/goldenPath/ for assembly that does not have
all the genome in fasta format were downloaded the .2bit format and converted in .fa using the ucsc tool fatotwobit
link of the program https://genome.ucsc.edu/goldenpath/help/twoBit.html
gtf download date: 11/03/2018
gtf download link: ucsc tablebrowser
table download date 11/03/2018
table download link : https://genome.ucsc.edu/cgi-bin/hgTables
Select Fields from "genome".refGene--> name
genome.refFlat fields -->geneName
Linked Tables--> refFlat
the table and the .gtf must be downloaded the same day

from ucsc table
browser https://genom
e.ucsc.edu/cgi-bin/hg
Tables
pietro 14.01.2018 from
ensemble portal
ftp://ftp.ensembl.org/pub/release-91/gff3/
22
mus musculusmm8yesno
23
from Federico (2016)
24
mm9yesyesyesyesyesyesyesyesyesno
25
201620162016201611/03/2018from Federico (2016)from Federico (2016)from Federico (2016)11/03/2018
26
rsem version: 1.3.0
command: rsem-prepare-reference --gtf (.gtf) --transcript-to-gene-map (table.txt) --bowtie (.fa) dm6
.fa download date:11/03/2018
.fa download url: from ucsc http://hgdownload.cse.ucsc.edu/goldenPath/ for assembly that does not have
all the genome in fasta format were downloaded the .2bit format and converted in .fa using the ucsc tool fatotwobit
link of the program https://genome.ucsc.edu/goldenpath/help/twoBit.html
gtf download date: 11/03/2018
gtf download link: ucsc tablebrowser
table download date 11/03/2018
table download link : https://genome.ucsc.edu/cgi-bin/hgTables
Select Fields from "genome".refGene--> name
genome.refFlat fields -->geneName
Linked Tables--> refFlat
the table and the .gtf must be downloaded the same day

from ucsc table
browser https://genom
e.ucsc.edu/cgi-bin/hg
Tables
27
mm10yesyesyesyesyesyesyesyesyesyes
28
201620162016201611/03/20182016201611/03/201811/03/201814/01/2018
29
mm10ToHg19.over.chain mm10ToMm9.over.chainrsem version: 1.3.0
command: rsem-prepare-reference --gtf (.gtf) --transcript-to-gene-map (table.txt) --bowtie (.fa) dm6
.fa download date:11/03/2018
.fa download url: from ucsc http://hgdownload.cse.ucsc.edu/goldenPath/ for assembly that does not have
all the genome in fasta format were downloaded the .2bit format and converted in .fa using the ucsc tool fatotwobit
link of the program https://genome.ucsc.edu/goldenpath/help/twoBit.html
gtf download date: 11/03/2018
gtf download link: ucsc tablebrowser
table download date 11/03/2018
table download link : https://genome.ucsc.edu/cgi-bin/hgTables
Select Fields from "genome".refGene--> name
genome.refFlat fields -->geneName
Linked Tables--> refFlat
the table and the .gtf must be downloaded the same day

from ucsc table
browser https://genom
e.ucsc.edu/cgi-bin/hg
Tables
from
ensemble portal
ftp://ftp.ensembl.org/pub/release-91/gff3/
30
Saccharomyces cerevisiaesacCer1yes
31
2016
32
sacCer1ToSacCer2.over.chain sacCer1ToSacCer3.over.chain
33
sacCer2yes
34
2016
35
sacCer2ToSacCer3.over.chain
36
sacCer3yesyesyesyesyesyesyesyesyesyes
37
201620162016201611/03/201820162016201611/03/201814/01/2018
38
sacCer3ToSacCer2.over.chain

rsem version: 1.3.0
command: rsem-prepare-reference --gtf (.gtf) --transcript-to-gene-map (table.txt) --bowtie (.fa) dm6
.fa download date:11/03/2018
.fa download url: from ucsc http://hgdownload.cse.ucsc.edu/goldenPath/ for assembly that does not have
all the genome in fasta format were downloaded the .2bit format and converted in .fa using the ucsc tool fatotwobit
link of the program https://genome.ucsc.edu/goldenpath/help/twoBit.html
gtf download date: 11/03/2018
gtf download link: ucsc tablebrowser
table download date 11/03/2018
table download link : https://genome.ucsc.edu/cgi-bin/
Select Fields from sacCer3.sgdGene --> name
Linked Tables --> sdgtoname
sacCer3.sgdToName fields --> value
the table and the .gtf must be downloaded the same day

from ucsc table
browser https://genom
e.ucsc.edu/cgi-bin/hg
Tables
pietro 14.01.2018 from
ensemble portal
ftp://ftp.ensembl.org/pub/release-91/gff3/
39
drosophila melanogasterdm6yesyesyesyesyesyesyesyesyes
40
02/02/201822/01/201819/01/201801/03/201811/03/201802/03/201801/03/201811/03/201814/01/2018
41
Bowtie (1.2.0) from conda.Bowtie2 (2.3.0) from conda1 Mar 2018 (mtangaro) - Added dm6ToDm3.over.chain from http://hgdownload.soe.ucsc.edu/goldenPath/dm6/liftOver/
Still to add to CVMFS server 90.147.102.186

rsem version: 1.3.0
command: rsem-prepare-reference --gtf (.gtf) --transcript-to-gene-map (table.txt) --bowtie (.fa) dm6
.fa download date:11/03/2018
.fa download url: from ucsc http://hgdownload.cse.ucsc.edu/goldenPath/ for assembly that does not have
all the genome in fasta format were downloaded the .2bit format and converted in .fa using the ucsc tool fatotwobit
link of the program https://genome.ucsc.edu/goldenpath/help/twoBit.html
gtf download date: 11/03/2018
gtf download link: ucsc tablebrowser
table download date 11/03/2018
table download link : https://genome.ucsc.edu/cgi-bin/hgTables
Select Fields from "genome".refGene--> name
genome.refFlat fields -->geneName
Linked Tables--> refFlat
the table and the .gtf must be downloaded the same day
http://hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/dm6.2bithttp://hgdownload.cse.ucsc.edu/goldenPath/dm6/bigZips/dm6.fa.gzfrom ucsc table
browser https://genom
e.ucsc.edu/cgi-bin/hg
Tables
pietro 14.01.2018 from
ensemble portal
ftp://ftp.ensembl.org/pub/release-91/gff3/
42
indices created with fasta file downloaded from tablebrowser ucsc from terminal with the command:

(__bowtie@1.2.0) (.venv) [galaxy@galaxy-server bin]$ ./bowtie-build /export/dm6fasta/dm6.fa dm6

.gtf files were cleaned to get rsem_index and better annotation
cleaning method:
1) executed the script chr_list.py to have the list of chromosome of the .gtf
→ command python chr_list.py file.gtf
2) deleted the extra-chromosome lines using sed (from terminal: sed -i ‘/’extra-chromosome’/d’ file.gtf)
3) executed the rewrite_gtf.py redirecting the output to a file.gtf
→ command python rewrite_gtf.py file.gtf>new_file.gtf
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100