Datasets from UCSD

Authors and/or software name

Publication title, journal (year)

Paper DOI

Dataset Address

Moshiri and Mirarab, Dual-brith

A Two-State Model of Tree Evolution and Its Applications to Alu Retrotransposition

10.1093/sysbio/syx088

https://doi.org/10.5061/dryad.13n52 

Sayyari and Mirarab, Polytomy test

Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies

10.3390/genes9030132

https://github.com/esayyari/polytomytest

https://gitlab.com/esayyari/polytomy  

Sayyari and Mirarab,

Fragmentary data

Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction

10.1093/molbev/msx261

http://esayyari.github.io/fragments.html 

Sayyari and Mirarab, DISTIQUE

Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction

10.1186/s12864-016-3098-z

http://esayyari.github.io/DISTIQUE 

Mai and Mirarab,

MV rooting

Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction

10.1371/journal.pone.0182238

https://uym2.github.io/MinVar-Rooting/ 

Mai and Mirarab,

TreeShrink

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

10.1186/s12864-018-4620-2

https://uym2.github.io/TreeShrink/ 

ASTRAL-III

Zhang, et al.

ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees

10.1186/s12859-018-2129-y

https://gitlab.com/esayyari/ASTRALIII 

Sayyari and Mirarab

LocalPP

Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies

10.1093/molbev/msw079

https://esayyari.github.io/FastLocalBranchSupport.html


Datasets formerly on utexas.edu

Below is a list of datasets that used to be at http://www.cs.utexas.edu/users/phylo/ but are now backed up here.

Authors and/or software name

Publication title, journal (year)

Paper DOI

Old Address

New Address

Mirarab et al., ASTRAL-I

ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation, Bioinformatics (2014)

10.1093/bioinformatics/btu462

http://www.cs.utexas.edu/~phylo/datasets/astral/ 

http://www.cs.utexas.edu/~phylo/software/astral/ 

https://sites.google.com/eng.ucsd.edu/datasets/astral 

Mirarab and Warnow, ASTRAL-II

ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes, Bioinformatics (2015)

10.1093/bioinformatics/btv234

http://www.cs.utexas.edu/~phylo/datasets/astral2/ 

https://sites.google.com/eng.ucsd.edu/datasets/astral/astral-ii 

Mirarab et al., Statistical binning

Statistical binning enables an accurate coalescent-based estimation of the avian tree, Science (2014)

10.1126/science.1250463

www.cs.utexas.edu/~phylo/datasets/binning 

https://sites.google.com/eng.ucsd.edu/datasets/binning 

Mirarab et al., Response to comment on binning

Response to Comment on ‘Statistical Binning Enables an Accurate Coalescent-Based Estimation of the Avian Tree.’ Science (2015)

10.1126/science.aaa7719

http://www.cs.utexas.edu/~phylo/datasets/binning-response

https://sites.google.com/eng.ucsd.edu/datasets/binning 

Bayzid et al., Weighted statistical binning

Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses, PLOS One (2015)

10.1371/journal.pone.0129183

http://www.cs.utexas.edu/~phylo/datasets/weighted-binning

https://sites.google.com/eng.ucsd.edu/datasets/binning 

Bayzid and Warnow,
Naive binning

Naive binning improves phylogenomic analyses, Bioinformatics (2013)

10.1093/bioinformatics/btt394

http://www.cs.utexas.edu/users/phylo/datasets/ILS/ 

https://sites.google.com/eng.ucsd.edu/datasets/ils-small 

Zimmermann, Mirarab, and Warnow, BBCA

BBCA: Improving the scalability of *BEAST using random binning, BMC Genomics (2014)

10.1186/1471-2164-15-S6-S11

http://www.cs.utexas.edu/~phylo/datasets/bbca/ 

https://sites.google.com/eng.ucsd.edu/datasets/binning 

Mirarab et al., PASTA

PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences, Journal of Computational Biology (2014)

10.1089/cmb.2014.0156

http://www.cs.utexas.edu/~phylo/software/pasta/ 

https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp 

Mirarab et al., SEPP

SEPP: SATé-Enabled Phylogenetic Placement, Pacific Symposium on Biocomputing (2012)

10.1142/9789814366496_0024

http://www.cs.utexas.edu/~phylo/software/sepp/submission/ 

http://www.cs.utexas.edu/~phylo/software/sepp/ 

https://sites.google.com/eng.ucsd.edu/datasets/microbiome/sepp 

Nguyen et al., UPP

Ultra-large alignments using phylogeny aware profiles, Genome Biology (2015)

10.1186/s13059-015-0688-z

http://www.cs.utexas.edu/users/phylo/software/upp/ 

https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp 

Nguyen et al., TIPP

TIPP:Taxonomic Identification and Phylogenetic Profiling, Bioinformatics (2014)

10.1093/bioinformatics/btu721

http://www.cs.utexas.edu/users/phylo/software/sepp/tipp-submission/ 

https://sites.google.com/eng.ucsd.edu/datasets/microbiome/tipp 

Mirarab and Warnow, FastSP

FastSP: Linear Time Calculation of Alignment Accuracy, Bioinformatics (2011)

10.1093/bioinformatics/btr553

www.cs.utexas.edu/~phylo/software/fastsp 

https://sites.google.com/eng.ucsd.edu/datasets/alignment/fastsp 

Liu et al., SATe-I

Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and Phylogenetic Trees, Science (2009)

10.1126/science.1171243

http://www.cs.utexas.edu/~phylo/sate/public/sate_journal.html 

https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i 

Swenson et al.

SuperFine

SuperFine: fast and accurate supertree estimation.

10.1093/sysbio/syr092

http://www.cs.utexas.edu/users/phylo/software/superfine/ 

https://sites.google.com/eng.ucsd.edu/datasets/dactalsuperfine 

Nelesen et al., DACTAL

DACTAL: divide-and-conquer trees (almost) without alignments, Bioinformatics (2012)

10.1093/bioinformatics/bts218

http://www.cs.utexas.edu/users/phylo/software/dactal/ 

https://sites.google.com/eng.ucsd.edu/datasets/dactalsuperfine 

Swenson et al., SMIDGEN

A simulation study comparing supertree and combined analysis methods using SMIDGen, Algorithms for Molecular Biology (2010)

https://doi.org/10.1186/1748-7188-5-8

http://www.cs.utexas.edu/users/mswenson/pubs/

https://sites.google.com/eng.ucsd.edu/datasets/dactalsuperfine

Liu and Warnow (BeeTLe)

Treelength Optimization for Phylogeny Estimation, PLOS One (2012)

doi:10.1371/journal.pone.0033104

http://www.cs.utexas.edu/users/phylo/datasets/treelength/ 

(data)

http://www.cs.utexas.edu/users/phylo/software/beetle/beetle-may-17-2013.tar.bz2 

(software)

https://sites.google.com/eng.ucsd.edu/datasets/dactalsuperfine

Liu et al.

RAxML and FastTree: Comparing Two Methods for Large-Scale Maximum Likelihood Phylogeny Estimation, PLOS One (2012)

10.1371/journal.pone.0027731

http://www.cs.utexas.edu/~phylo/sate/public/sate_journal.html 

https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i 

Yang and Warnow

Fast and accurate methods for phylogenomic analyses, BMC Bioinformatics (2011)

10.1093/bioinformatics/btt394

http://www.cs.utexas.edu/users/phylo/datasets/ILS/ 

https://sites.google.com/eng.ucsd.edu/datasets/ils-small 

Bayzid, Hunt, and Warnow

Disk covering methods improve phylogenomc analyses, BMC Genomics (2014)

10.1186/1471-2164-15-S6-S7

http://www.cs.utexas.edu/users/phylo/software/dcm-protocol/ (scripts)

http://www.cs.utexas.edu/~phylo/datasets/astral/ (datasets)

https://sites.google.com/eng.ucsd.edu/datasets/ils-small 
(scripts)

https://sites.google.com/eng.ucsd.edu/datasets/astral/astral-i (datasets)

On other archives

The following papers did not have any data at utexas.edu

Liu et al. (POY*)

Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy, TCBB (2009)

doi:10.1109/TCBB.2008.63

Data not made available, but paper provides details of how to generate the simulated data and TreeBase accession numbers

n.a.

Mirarab, Bayzid, and Warnow

Evaluating summary methods for multi-locus species tree estimation in the presence of incomplete lineage sorting, Systematic Biology (2014)

10.1093/sysbio/syu063

http://datadryad.org/resource/doi:10.5061/dryad.310q3 

Barbancon et al., Simulated linguistic datasets

An experimental study comparing linguistic phylogenetic reconstruction methods, Diachronica (2013)

doi:10.1075/dia.30.2.01bar

Data not made available, but paper provides details of how to generate the simulated data. (Contact Warnow if interested in obtaining the data.)

Liu et al., SATe-II

SATe-II: Very Fast and Accurate Simultaneous Estimation of Multiple Sequence Alignments and Phylogenetic Trees, Systematic Biology (2012)

10.1093/sysbio/syr095

16S.B.ALL and 16S.T: doi:10.5061/dryad.n9r3h;jsessionid=7744D50A080E12DB90E261C0D7BBAE4A

In addition to dryad:

16S.B.ALL and 16S.T: https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s 

and

https://gitlab.msu.edu/liulab/SATe-II-simulated-100-taxon-datasets