ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAI
1
TimestampannotatorTopology: Is this a gene tree or species tree?Topology: Is this topology rooted or not?Alignment Method: name of software used, version of program Tree Inference Method: parameters used, including model of evolution, and optimality criterionTree Inference Method: character weights if (normally then morphological) characters were weighted. Tree Inference Method: name of software used, version of program Alignment Method: parameters used (or default if default values were used).Alignment Method: whether alignment was manually corrected or editedCharacter matrix: Data type must be provided, for example DNA, RNA, protein, morphology, etc.Character matrix: For molecular matrices, the accession numbers (and respective database(s) if different from Genbank) of the sequences used for each row must be providedCharacter matrix: a mapping that relates each row identifier to a tip of the topologyCharacter matrix: a mapping that relates each accession number or specimen identifier to a row labelOTUs: A meaningful external identifier (a combination of database or resource and identifier/accession within that database).OTUs: For specimens, museum, collection (if applicable), and specimen identifier.OTUs: Precise (GPS) georeferences for specimens are highly desirable (but not always available).Branch lengths: Some measure of branch length required unless it is not applicable to the analysis method.Branch support: Some value of branch support should be provided, for example posterior probability, or bootstrap value, unless it is not applicable to the analysis method.Topology: database record for tree if availablea local name or identifier for this treePublication: author listPublication: yearPublication: titlePublication: citation (free text)Publication: DOIOther information for the purposes of this studyTopology URICharacter matrix URIOther information on the location of data resourcesTopology: nature of topology as representation of inference methodOther annotations of the tree as a wholeOther annotations of the character matrix as a wholeOther annotations of the OTUsOther annotations of the branches
2
1/29/2013 11:00:00Andreaspecies treerootedn/aparameters are not available in the publicationnot applicablePAUP v4.0b10n/an/aThis is a divide-and-conquer super tree based on earlier studies (largely Beck et al., 2006) that used DNA and morphological characters. From the supplent: "All supertrees were constructed using standard, unweighted matrix representation with
parsimony (MRP
30,31
), where the topologies of the source trees were converted into a partial
binary matrix: species descended from a given node were coded as 1; those that were not, but
were present on the tree were coded as 0; and all other taxa were coded as ? for that source
tree. Except for the Beck et al.
49
, Muridae, Perissodactyla, and Primates supertrees, a
hypothetical all-zero outgroup was added to each matrix consisting of the concatenated
matrix representations of the source trees; the former four analyses instead used a semirooted form of MRP
59
, where only robustly rooted source trees were rooted with an all-zero
outgroup; otherwise, the outgroup received ‘?’. All matrices were analyzed using a
parsimony criterion with the search strategy being tailored to the size of the matrix. For the
larger groups, we used the parsimony ratchet
60
to facilitate a more efficient search of tree
space. Where appropriate, we also used safe taxonomic reduction
61
as implemented in the
Perl script PerlEQ v1.0.x (Jeffries and Wilkinson, unpubl.) to identify poorly known species
that would contribute to substantial loss of local resolution. We used the results of this analysis as a guide for pruning potentially problematic species from the source trees before
subsequent recoding and re-analysis of the MRP matrices (following ref. 62) to improve
resolution. In all cases, the final tree for each group was a strict consensus of all equally most
parsimonious trees. Additional detail on individual search strategies can be found in the
respective publications."
n/an/an/aCollections information is not providedGeoreference information is not providedBranch lengths are applicable and provided. Branch support values are not applicable.Mammals supertreeBininda-Emonds; Cardillo; Jones; MacPhee; Beck' Grenyer; Price; Vos; Gittleman; Purvis2007The delayed rise of present-day mammals.Bininda-Emonds, O. R. P., Cardillo, M., Jones, K. E., MacPhee, R. D. E., Beck, R. M. D., Grenyer, R., Price, S. a, et al. (2007). The delayed rise of present-day mammals. Nature, 446(7135), 507–12. doi:10.1038/nature0563410.1038/nature05634No character matrix because it's a supertree; supplementary data is available at http://www.nature.com/nature/journal/v446/n7135/suppinfo/nature05634.html

There's a substantial supplementary file on methods located here: http://www.nature.com/nature/journal/v446/n7135/extref/nature05634-s1.pdf
consensus treeFrom the paper: "The supertree was constructed in a hierarchical framework, combining pre- existing supertrees for Carnivora, Chiroptera, ‘Insectivora’ (split into Afrosori- cida and Eulipotyphla) and Lagomorpha with new ones for the remaining groups, including the base supertree of all extant families (see Supplementary Table 1). All new supertrees were built using an explicit source tree collection protocol29 to minimize both data duplication (for example, where the same data set underlies more than one source tree) and the inclusion of source trees of lesser quality (for example, taxonomies or those based on appeals to authority). Species names in the source trees were standardized to those found in ref. 23, and extinct taxa (following the 2004 IUCN Red List; http://www.redlist.org) were pruned from the final supertree. All supertrees were obtained using Matrix Representa- tion with Parsimony (MRP30,31), with the parsimony analyses for the new super- trees being performed in PAUP* v4.0b10 (ref. 32)."

The Beck et al tree that this tree uses as a foundation used both DNA and morphological characters
A list of source trees is available in Supplementary file 1, Table 1.The authors use Wilson,D.E.&Reeder,D.M.(eds)Mammal Species of the World: a Taxonomic and Geographic Reference (Smithsonian Institution Press, Washington, 1993) as their namespace.This is an ultrametric tree; branch lengths represent time; see methods on pages 510-511 for more explanation
3
1/29/2013 11:29:09Enricospecies treerootedN/ANot described, most likely manually curatedUnknownNot ApplicableMeaningful external identifiers are not providedNot ApplicableGeoreference information is not providedBranch lengths are not applicable.Branch support values are not applicable.APGIIIBirgitta Bremer, Kåre Bremer, Mark W. Chase, Michael F. Fay, James L. Reveal, Douglas E. Soltis, Pamela S. Soltis, Peter F. Stevens2009An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IIIBotanical Journal of the Linnean Society, 2009, 161, 105–121A revised and updated classification for the families of flowering plants is providedhttp://phylotastic.org/hack2/arbitrary_hash/APGIII#apgiiiRevised and Updated classification of flowers and plantsThis is a phylogenetically informed taxonomy framework.
4
1/29/2013 13:32:44Arlinspecies treerootedMAFFT v6.712bfrom the publication section "Tree reconstruction."
"Phylogenetic inference of subset 1 and of subset 2 was done under the maximum likelihood (ML) optimality criterion in partitioned analyses with RAxML 7.2.8 [42,43] under the GTRCAT model. Analyses were computed on HPC Linux clusters, 8 nodes with 12 cores each, at the Regionales Rechenzentrum Köln (RRZK) using Cologne High Efficient Operating Platform for Science (CHEOPS); input was done in phylip format; and conversion of Fasta to phylip was done using Readseq [44] [XVII]. Nuclear coding genes were treated as one partition (PROTCAT model, substitution matrix LG + F, taken from ProtTest [45]). All other groups of orthologs were treated as separate partitions (32 partitions in total). (See Additional file 4 for the character partitions of subset 1 and 2.) We applied the rapid bootstrap algorithm [46] with a subsequent tree search. The numbers of bootstrap replicates were estimated on the fly by the "bootstopping" criteria implemented in RAxML 7.2.8 (default settings) [47]. The analyses yielded two trees. These trees are referred to as "tree 1" (corresponding to subset 1) and "tree 2" (corresponding to subset 2). Trees were edited in Dendroscope [48] [XVIII]."
Apparently no character weights were applied. Note, however, that the alignment was masked as described under Alignment Method. RAxML 7.2.8from the publication section "Multiple sequence alignment and alignment masking":
"Orthologous sequences were aligned with MAFFT v6.712b using the auto option [IV]. Depending on the size of an alignment, MAFFT automatically chooses a suitable alignment option, such as L-INS-i for < 200 sequences and FFT-NS-2 for > 2,000 sequences [33,34]. All alignments were subsequently refined with the refinement option in MUSCLE version 3.7 [35] [V]. These are powerful alignment tools that allow processing very large data sets in reasonable time. Steps II through VI of our pipeline are automatically consecutively executed when using the script batch2_IItoVI.sh. (See the manual of batch scripts for details.) Aligned and refined mitochondrial amino acid sequences were then translated back into nucleotide sequences with the aid of the script aa2dna, which uses the corresponding reading frame information from the GenBank file [VI]. From this point on, we proceeded with nucleotide sequences for all mitochondrial sequences and nuclear noncoding sequences, as well as with amino acid sequences for the nuclear coding sequences (available since step [a.III]).

Ambiguously aligned or highly diverged regions of the alignment were masked with three different algorithms [VII]. We applied ALISCORE [36,37] and ALICUT [38] for noncoding nucleotide sequences and for nuclear amino acid sequences (default settings). Since the multiple sequence alignment of 28S rRNA was too big to be processed with ALISCORE, we used Gblocks 0.91b [39,40] for 28S instead (block parameter settings: (1) number of included seq/2 = 1020, (2) 1020, (3) 5, (4) 10, and (5) all). Finally, we used the script gapkiller to identify and delete sites with more than 70% gaps in coding mitochondrial sequences. Then we masked all third codon positions of mitochondrial coding sequences [VIII] and concatenated all tRNA alignments to one single alignment."
not manually correctedGenBank accession numbers are providedthe mapping is implicitthe mapping is not providedNot relevantNot relevantBranch lengths are applicable and provided. Branch support values are applicable and provided. Peters_2011_hymenopteraRalph S Peters, Benjamin Meyer, Lars Krogmann, Janus Borner, Karen Meusemann, Kai Schütte, Oliver Niehuis and Bernhard Misof2011The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequencesBMC Biology 2011, 9:5510.1186/1741-7007-9-55Tree was obtained by personal communication from the first author to Arlin Stoltzfus, 29 Jan, 2013. http://www.evoio.org/wiki/File:Tree_1_Peters_et_al.trenot availablemost likely tree by incomplete searchWe infer that the tree is rooted by an outgroup method. Outgroups are specified "Sequence data retrieval and data processing" but they do not appear in the tree. The authors say that the tree was "edited" in Dendroscope, but do not explain what was the nature of the editing. We infer that the authors removed the outgroups from the final tree. We assume in this case that the row labels match the tree labels. GenBank accession numbers are provided, but they are not mapped to species names. Binomials are provided in some cases. In other cases, we have <genus>_sp. Based on reading the section "Species and sequence subset selection", we are unable to determine what these entities represent. However, based on NCBI taxonomy searches, we find that these names correspond to entities in GenBank with incomplete names, e.g., Sania_sp_5 is presumably NCBI's "Sania sp. 5 JCB-2006". See methods for the meaning of support values.
5
1/29/2013 14:06:54Andreaspecies treerootedmethod follows that described by Smith et al., 2009Phylogenetic trees were inferred using the Pthreads-based and SSE3-vectorized RAxML (Stamatakis, 2006b) version 7.2.6. The post-analysis steps (consensus tree building, evaluating the final trees under the GTR+GAMMA model etc.) were carried out with RAxML v7.2.7. We used the standard RAxML search algorithm with the asymptotic stopping rule and the low memory consumption flag (-F and -D options) to infer 223 ML trees on the original alignment under the GTR+CAT approximation of rate heterogeneity (Stamatakis, 2006a) and a partitioned model (we estimated the GTR and alpha parameters separately for each gene) with a joint branch length estimate. The usage of the GAMMA model of rate heterogeneity was not possible on all multi-core systems we used for the analysis
because of memory limitations (a run under GTR+CAT required approximately 30GB of main memory, a run under GAMMA requires approximately four times more memory). Branch lengths and likelihood scores under GTR+GAMMA for all 223 ML trees were computed using the -f n option.

We also inferred 244 bootstrap trees using the RAxML rapid bootstrap algorithm (Stamatakis et al., 2008). We then plotted BS support values onto the best-scoring ML tree and also computed strict, majority-rule, and extended majority rule consensus trees for the bootstrap replicates and the ML trees on the original alignment. We also applied the bootstopping (bootstrap convergence) tests (Pattengale et al., 2010) a posteriori to the bootstrap trees. The test indicated that an insufficient number of BS replicates has been computed to guarantee stable support values. Finally we computed pair-wise Robinson Foulds (RF) distances between all ML trees (average relative RF: 21.79%) and all bootstrap trees (average relative RF: 53.32\%).
n/aRAxML v7.2.7from the paper, pg 408: "Once sequences are identified to belong to the gene regions of interest, saturation analyses are conducted compar- ing uncorrected genetic distances to corrected distances. If alignments appear to be saturated, the alignments are broken up using prior phylogenetic knowledge (classification systems) as guides, and separate alignments are carried out for the individ- ual groups delimited in this way. These individual alignments are then aligned together using profile-to-profile alignment techniques (Edgar, 2004). Our final concatenated data set in- cluded 55 473 species and 9853 aligned sites (Appendix S1; see Supplemental Data with the online version of this article). "not manually correctedDNAaccession numbers are not providedn/a, no character matrixn/a, no character matrixbinomials, some with citations (those with sp)Collections information is not providedGeoreference information is not providedBranch lengths are applicable but not provided.Branch support values are applicable but not provided.Smith et al 2011 AngiospermsSmith SA, Beaulieu JM, Stamatakis A, Donoghue MJ2011Understanding angiosperm diversification using small and large phylogenetic trees. American Journal of Botany 98(3): 404-414. doi:10.3732/ajb.1000481Smith SA, Beaulieu JM, Stamatakis A, Donoghue MJ (2011) Understanding angiosperm diversification using small and large phylogenetic trees. American Journal of Botany 98(3): 404-414. doi:10.3732/ajb.1000481doi:10.3732/ajb.1000481Builds on APG III treehttp://dx.doi.org/10.5061/dryad.8790n/aSupplemental files published with paper located at: http://www.amjbot.org/content/98/3/404/suppl/DC1

Supplemental files on Dryad:
http://datadryad.org/handle/10255/dryad.8790

consensus treefrom the paper: " The data set was assembled using the methods described in Smith et al. (2009), as implemented in the PHLAWD program."
6
1/29/2013 14:08:54Arlinspecies treerooted(not applicable)The tree topology is the product of a community of hundreds of contributors (curators) that manage particular branches of the tree. There is no fixed method for inferring trees. Not applicable. (not applicable)(not applicable)(not applicable)(not applicable)(not applicable)(not applicable)(not applicable)Meaningful external identifiers are not provided(not applicable)(not applicable)Branch lengths are applicable but not provided.Branch support values are applicable but not provided.ToLWeb2006David R. Maddison, Katja-Sabine Schulz, Wayne P. Maddison2007The Tree of Life Web ProjectZootaxa 1668: 19–40The tree was downloaded from ToLWeb in 2012, but apparently this version is from 2006, which is why we have named it ToLWeb2006. http://www.evoio.org/wiki/File:TOL.xml.zipThis information is available in interactive form at http://www.tolweb.org . See Methods belowBinomials are provided, but not references to an external namebank. This is promoted as a phylogeny, therefore branch lengths and support values are appropriate. However, the tree lacks these features.
7
1/29/2013 14:13:29Enricospecies treerootedMAFFT and PRANKPseudo-posterior samples of complete avian trees were assembled as follows. (1) Every bird species was assigned to one of 158 clades identified using a backbone phylogeny27. (2) Relaxed-clock trees were generated for each clade from sequence data. (3) Relaxed-clock trees for entire clades were generated combining species with and without genetic data: species without genetic information (3,330) were placed within their clade using constraint structures consistent with consensus trees from step (2) plus taxonomic information and branching times sampled froma pure birth model of diversification. (4) Final trees were assembled from the clade distributions plus samples of dated backbone trees from (one of two) distributions constructed using relaxed molecular clock techniques, 15 genes, ten fossil constraints and extensive topology constraints derived from published sources.multiple methodsmanually correctedDNAGenBank accession numbers are providedthe mapping is providedthe mapping is providedMeaningful external identifiers are not providedCollections information is not providedGeoreference information is not providedBranch lengths are applicable and provided. Branch support values are applicable and provided. Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K., Mooers, A. O.2012The global diversity of birds in space and timeNature, Vol. 491:444, 2012doi:10.1038Start from backbone phylogenyheight_median, height, height_95%_HPD, length, height_range, posterior
8
1/29/2013 14:34:13Ramonaspecies treerootedAMPHORAA maximum likelihood tree was then constructed from the concatenated-alignment using PHYML10. The model selected based on the likelihood ratio test was the WAG model of amino acid substitution with γ-distributed rate variation (five categories) and a proportion of invariable sites. The shape
of the γ-distribution and the proportion of the invariable sites were estimated by the program.
no weights givenPHYMLNo parameters listed in publication. not manually correctedproteinaccession numbers are not providedthe mapping is providedno accesssion numbersMeaningful external identifiers are not providedCollections information is not providedGeoreference information is not providedBranch lengths are applicable and provided. Branch support values are applicable but not provided.Genomic Encyclopedia of Bacteria and Archaea treeWu D., Hugenholtz P., Mavromatis K., Pukall R., Dalin E., Ivanova N.N., Kunin V., Goodwin L., Wu M., Tindall B.J., Hooper S.D., Pati A., Lykidis A., Spring S., Anderson I.J., D’haeseleer P., Zemla A., Singer M., Lapidus A., Nolan M., Copeland A., Chen F., Cheng J., Lucas S., Kerfeld C., Lang E., Gronow S., Chain P., Bruce D., Rubin E.M., Kyrpides N.C., Klenk H., Eisen J.A. 2009A phylogeny-driven genomic encyclopaedia of Bacteria and ArchaeaWu D., Hugenholtz P., Mavromatis K., Pukall R., Dalin E., Ivanova N.N., Kunin V., Goodwin L., Wu M., Tindall B.J., Hooper S.D., Pati A., Lykidis A., Spring S., Anderson I.J., D’haeseleer P., Zemla A., Singer M., Lapidus A., Nolan M., Copeland A., Chen F., Cheng J., Lucas S., Kerfeld C., Lang E., Gronow S., Chain P., Bruce D., Rubin E.M., Kyrpides N.C., Klenk H., & Eisen J.A. 2009. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature, 462(7276): 1056-1060.DOI: 10.1038/nature08656 Will add files to project github site.
TreeBASE: http://purl.org/phylo/treebase/phylows/study/TB2:S10965
most likely tree by incomplete searchA maximum likelihood phylogenetic tree for bacterial genomes was built upon a concatenated alignment of 31 phylogenetic marker genes. We included 53 GEBA bacteria and 667 bacterial compete genomes from Genbank for the tree building. Protein sequences for 31 phylogenetic marker genes (dnaG, frr, infC, nusA, pgk, pyrG, rplA, rplB, rplC, rplD, rplE, rplF, rplK, rplL, rplM, rplN, rplP, rplS, rplT, rpmA, rpoB, rpsB, rpsC,
rpsE, rpsI, rpsJ, rpsK, rpsM, rpsS, smpB, and tsf) were retrieved, aligned, trimmed, and concatenated using the software AMPHORA9. A maximum likelihood tree was then constructed from the concatenated-alignment using PHYML10. The model selected based on the likelihood ratio test was the WAG model of amino acid substitution with γ-distributed rate variation (five categories) and a proportion of invariable sites. The shape of the γ-distribution and the proportion of the invariable sites were estimated by the program.
Many taxon names have strain IDs as well as binomials.Species names as binomials are provided but not mapped to a database. No information is provided for collection numbers or georeferences, but there are strain numbers for some of the taxa.
9
1/29/2013 15:33:22Ramonaspecies treenot rootedSINA, as implemented by the SILVANo weighting, but: "To exclude positions where positional orthology could not be guaranteed in the alignment, three filter sets were applied to remove positions where the highest occurring base was conserved at less than 30%, 40% and 50% (Table 2)."RAxML version 7.xno parameters given in publicationmanually correctedRNAnon-GenBank accession numbers are providedmapping is implicit because both binomial and accession number are included in the topologyimplicit because row names are the same as tip namestbinomials are include in the row names and topologyThere is an internal accession number as part of each species nameGeoreference information is not providedBranch lengths are applicable and provided. Branch support values are applicable but not provided.all species 16SPablo Yarzaa, Michael Richter, Jo¨rg Pepliesb, Jean Euzebyc, Rudolf Amannd, Karl-Heinz Schleifere, Wolfgang Ludwige, Frank Oliver Glo¨ckner, Ramon Rossello´-Mo´ra2008The All-Species Living Tree project: A 16S rRNA-based phylogenetic tree of all sequenced type strainsP. Yarza, et al., The All-Species Living Tree project: A 16S rRNA-based phylogenetic tree of all sequenced type strains, Syst. Appl. Microbiol. (2008), doi:10.1016/j.syapm.2008.07.001doi:10.1016/j.syapm.2008.07.001Will upload tree and matrix to github.
Tree is regularly updated. Publication is from 2008, but most recent tree was released in July 2012. See http://www.arb-silva.de/projects/living-tree/ for link to latest tree and methods.
most likely tree by incomplete searchThe visual images of the tree do not show a root, and the publication/website do not specify any rooting method.

Sequences had been automatically aligned by SINA, as implemented by the SILVA database project [24]. Briefly, the system searches for the closest relatives in a set of 51,601 manually curated SSU sequences (Seed). Up to 40 related sequences are then used as references for the alignment of the sequence under investigation. Although the process is highly accurate, some of the bases usually escape optimal placement according to biological criteria. The complete dataset of 9975 sequences (type strains and non-type strains) was manually checked in order to improve inaccurately
placed bases. For this, the secondary structure of the SSU was taken into account. The final alignment can be retrieved as an ARB database, as well as supplementary material in an aligned multi-FASTA file, and from www.arb-silva.de/living-tree.

Original publication describes tree as inferred using RAxML version 7.0, but presumably the latest tree was constructed using an more recent version.
Example of a taxon name from the newick file: "Pantoea_calida__GQ367478__Enterobacteriaceae"
10
1/29/2013 15:37:17Andreaspecies treenot rootedMuscle, Mafft"The combined data set with morphology plus molecules and the molecule-only data set were analysed separately. Tree searches, identical for molecular and combined data sets, ran in parallel on three computers (totalling 16 processors and 96 GB RAM), examining for each data set ∼ 7.5 × 1014 rearrangements in ∼ 2.5 months’ processor-time. To estimate ambiguity, for each data set we used eight independent replicates with tree bisection reconnection (TBR) followed by sectorial searches (see details below). The best tree for each of the two data sets was found by combining the eight independent trees with tree-fusing (Goloboff, 1999; Goloboff and Pol, 2007) and then subjecting the fused tree to sectorial search, as detailed below.

For each starting point, TBR-swapping for the molecule-only trees saved 50 000–56 000 steps from the Wagner trees, and for the combined data set, about 35 000 steps. After concluding TBR, each of the resulting trees was subjected to a sectorial search routine, analysing in parallel 16 sectors (with a size of ∼ 4500 each) at the same time. Each sector was analysed for up to 4 h, with the following commands (see documentation of TNT for details):

bbreak:

cluster 20;

timeout 4:00:00;

sectsch:

xss 15-8+6-2 gocomb 50

combst 5 fuse 4 slack 20 drift 7;

xmult =

repl 8 rss xss drift 4 hit 10

dumpfuse keep; tfuse; best;

The tree-fusing at the end (tfuse command) guarantees that the final solution for the sector is no worse than the initial one. The results for the sectors were merged, and the resulting tree was subject to TBR in parallel (using three slave processes per machine, the maximum allowed by the RAM available in each machine, for a total of nine slave processes in the virtual machine). This alternation between sectorial search and TBR was repeated in 5–7 cycles, slightly changing the sectors selected, and the random seeds used for searching new solutions for each of the sectors. In the final cycles (as the trees approached optimality), the virtual machine examined about 740 × 106 rearrangements/s, requiring between 0.5 and 2 h to complete TBR.

The trees resulting from each of the eight independent starting points were then subjected to several rounds of tree-fusing, and the resulting tree was subjected to three cycles of alternating sectorial search and TBR in parallel, but in this case breaking the tree into only seven pieces (sectors of about 10 000 taxa), and running each sector for up to 16 h. Each reduced data set was analysed by means of the following commands:

bbreak:

cluster 20;

sectsch:

xss 32-25 + 5-1 gocomb 50 combst 5

fuse 4 slack 20 drift 5;

timeout 8:00:00;

xmult =

repl 8 rss xss drift 4 hit 10

dumpfuse keep prvmix;

tfuse; tchoose/;

sectsch = xss5-3 + 1-1

[ sectsch: xss10 + 3-1;

xmult =

xss rss hit 1 rep 8 nofu keep;

tfuse; best; ];

The search commands indicated within square brackets are those to be used for analysis of the (five to three) sectors in which the sectorial search command (sectsch) will further partition each reduced tree of ∼ 10 000 taxa."
TNT - http://www.zmuc.dk/public/phylo geny/TNT/Scripts"All sequences other than LSU and SSU were aligned with Muscle (Edgar, 2004). Nuclear LSU and SSU were aligned with Mafft (Katoh et al., 2005; Katoh, 2008). The alignment of LSU and SSU involved the following steps: (i) separate the complete data set in subsets of approximately 2000 sequences; (ii) align each data set separately using the Mafft option of considering a previously aligned sequence as a “template” for the multiple alignment [17 LSU and 70 SSU sequences, downloaded from the European ribosomal database (http://bioinformatics.psb.ugent.be/webtools/rRNA/ssu and http://bioinformatics.psb.ugent.be/webtools/rRNA/lsu/index.html), which take into account structural considerations]; (iii) find conserved regions common to all the aligned data sets; (iv) subdivide “vertically” each subset of 2000 sequences at the conserved regions identified in step 3, producing data sets of 2000 species per ∼ 50–200 bp each; (v) combine each corresponding partial data set obtained in step 4 and generate a data set of 20 000 species per 50–200 bp; (vi) erase the gaps; (vii) perform a multiple alignment with the data set of step 6; (viii) manually adjust the alignments."manually correctedprotein and DNAaccession numbers are not providedthe mapping is not providedthe mapping is not providedMeaningful external identifiers are providedCollections information is not providedGeoreference information is not providedBranch lengths are applicable but not provided.Branch support values are applicable but not provided.GenBank EukaryotesGoloboff, P. A., Catalano, S. A., Mirande, J. M., Szumik, C., Arias, J. S., Kallersjo, M., & Farris, J. S. 2009Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groupsGoloboff, P. A., Catalano, S. A., Mirande, J. M., Szumik, C., Arias, J. S., Kallersjo, M., & Farris, J. S. (2009). Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups. Cladistics, 25, 211–230. doi:10.1111/j.1096-0031.2009.00255.x10.1111/j.1096-0031.2009.00255.xhttp://www.zmuc.dk/public/phylogeny/TNT/More/Supp_Data_Set.tgzn/aMost parsimonious tree by incomplete (nonexhaustive) search"Where possible and appropriate, we used amino acid sequences (COX I–III, CytB, RNAPII). The resulting alignments were inspected visually and, when possible, improved manually; regions that were too gappy were excluded from the final data sets. Given the obviously problematic nature of the alignments and incomplete sequences (many gaps are lack of data, not real deletions), we considered gaps as missing. Multiple sequences for the same species were excluded to maximize taxonomic diversity instead of simply using large numbers of identical sequences."Normally we weren't saying that binomials count as IDs, but in this case they are, because they are explicitly, algorithmically getting their taxonomy from NCBI.
11
1/29/2013 16:02:08Arlinspecies treerooted(not applicable)From Federhen, 2012
"It was obviously important to provide a single taxonomic classification to index the entire set of entries in Entrez. The first step was to shuffle together the taxonomies from each of the contributing databases, each of which covered a somewhat different set of species with often very different internal classifications. The end result of this process was a hideous abomination, but it did provide a single classification that spanned all of the entries in Entrez, which we set out to improve. At this point we hosted series of taxonomy workshops to provide advice and direction for the project. David Hillis, John Taylor and Gary Olsen, in particular, put in a significant amount of time and effort in the initial cleanup of our merged classification.

The next step forward was the 1997 agreement by the INSDC members to resolve taxonomic issues of nomenclature and classification prior to the release of new sequence data . . ."

"We try to maintain a phylogenetic taxonomy—one in which the structure of the classification corresponds with the evolutionary history of the tree of life. A phylogenetic classification aims to include only monophyletic groups—groups in which all of the members are more closely related to each other than any of them are to anything outside of the group. . . "

"There are several large taxonomy database projects that seek to aggregate names from other sources into more or less comprehensive collections—the Catalog of Life, the Encyclopedia of Life, NameBank and WikiSpecies, for example. These are useful resources for the taxonomy group when we research the names that we add to our database, and we maintain reciprocal links with many of them. Even more useful are the curated specialty databases that are devoted to a particular group—IPNI for the plants, Index Fungorum and MycoBank for the fungi, Algaebase for the algae, AmphibiaWeb and Amphibian Species of the World for the amphibians, the Catalog of Fishes and FishBase for the fish, Bergey's Manual for the prokaryotes and so on. More than 150 outside groups are registered to maintain LinkOut (http://www.ncbi.nlm.nih.gov/projects/linkout/) links in the NCBI Taxonomy database. But in every case, the ultimate authoritative source for the nomenclature and classification is the primary taxonomic literature itself."
(not applicable)(not applicable)(not applicable)(not applicable)(not applicable)(not applicable)(not applicable)(not applicable)Meaningful external identifiers are provided(not applicable)(not applicable)(not applicable)(not applicable)NCBI_29Jan2013 ftp://ftp.ncbi.nih.gov/pub/taxonomy/(not applicable)We obtained the Newick conversion of the tree, with species names, and collapsed unbranched nodes, from the IToL web site (http://itol.embl.de/other_trees.shtml). The precise URL is http://itol.embl.de/ncbi_tree/ncbi_complete_collapsed_with_names.newick.gz and this was downloaded Jan 29, 2013. (not applicable)The resource is linked to the terms of the NCBI taxonomy database.
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100