Metagenomics - Tools, Methods and Madness

Jonathan Jacobs / @bioinformer

jonathan.jacobs@gmail.com 

:)

THIS LIST IS OUTDATED, but now that Zotero citations work for GoogleDocs - i’m in the process of fixing it up (time permitting).

A SLIGHTLY LESS OUTDATED GOOGLE SHEET LOCATED HERE, but I still perfer the long form document (this one).

PLEASE LEAVE A COMMENT ON THE BLOG I SET UP AND I’LL UPDATE THE GOOGLE SPREADSHEET AS I HAVE TIME.

This material assembled with thanks to

And tons of other people. If you have contributed - please leave a comment here and let me know what your twitter handle is…

Some kind of table of contents…

Table of Contents

The goal of this document is to capture current tools, methods and the overall madness of metagenomics as a science and the emerging commercial field. Wherever possible, I’ll add links / references to each resource, etc. but THIS IS BY NO MEANS COMPLETE.

SHOTGUN METAGENOMICS ANALYSIS METHODS

The majority of the methods outlined below are intended for community profiling - not determining if a specific pathogen -  is present/absent from the profile. It’s implied that these tools will be used as the basis for an initial profiling of a sample, and then potential pathogens of interest will be assessed from the data using simple filtering.

THIS LIST IS MOVING TO A GOOGLE SHEET LOCATED HERE.

It got too big to maintain as a text document…

Targeted Amplicon Sequencing

Examples include 16S rRNA analysis, etc -- but I’m not actively updating this. [last edit ~2015]. Interesting that most commercial companies simply use these tools or some variant of these tools.

DADA2 (2016)

  1. 1: Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2:
    High-resolution sample inference from Illumina amplicon data.
     Nat Methods. 2016
    Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23. PubMed PMID:
    27214047; PubMed Central PMCID: PMC4927377.
  2. https://github.com/benjjneb/dada2

metagenomeSeq Bioconductor package (2013)

  1. http://cbcb.umd.edu/software/metagenomeSeq
  2. Paulson, J. N., Stine, O. C., Bravo, H. C. & Pop, M. Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10, 1200–2 (2013).

MOTHUR (2009- 2014)

  1. http://www.mothur.org
  2. Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–41 (2009).

CloVR (2011)

  1. http://www.clovr.org
  2. Angiuoli, S. V et al. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12, 356 (2011).

QIIME (2010)

  1. http://qiime.org
  2. Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–6 (2010).

METHODS TO GENERATE SYNTHETIC READS

Bear (2014)

  1. https://github.com/sej917/BEAR
  2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168713/

NeSSM (2013)

  1. http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php
  2. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0075448

cMESSI (2012)

  1. https://sourceforge.net/projects/cmessi/
  2. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0031386

Grinder (2012)

  1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3384353/

MetaSim (2008)

  1. http://ab.inf.uni-tuebingen.de/software/metasim
  2. Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008): MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10): e3373. doi:10.1371/journal.pone.0003373

FUNCTIONAL CHARACTERIZATION PIPELINES

http://www.nature.com/nbt/journal/v31/n6/abs/nbt.2579.html

HUMAnN2 (2017)

  1. https://bitbucket.org/biobakery/humann2/wiki/Home
  2. HUMAnN2 manuscript submitted. HUMAnN1 reference: Abubucker, S. et al., (2012). Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Computational Biology 13(8):e1002358 doi:10.1371/journal.pcbi.1002358

MetaPath (2015)

  1. http://www.cbcb.umd.edu/software/metapath 

Tax4Fun (2015)

  1. http://tax4fun.gobics.de/
  2. Aßhauer, K.P., Wemheuer B. , Daniel R. and Meinicke, P. (2015).  Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31(17), 2015, 2882–2884 doi: 10.1093/bioinformatics/btv287

IslandViewer 3/GenomeD3Plot (2015)

  1. http://pathogenomics.sfu.ca/islandviewer
  2. Dhillon, B. et al. (2015). IslandViewer 3: more flexible, interactive genomic island discovery, visualization and analysis. Nucleic Acids Research doi: 10.1093/nar/gkv401

Roary (2015)

  1. http://sanger-pathogens.github.io/Roary
  2. Page, A.J. et al. (2015). Roary: rapid large-scale prokaryote pan genome analysis.  Bioinformatics, 31(22), 3691–3693 doi: 10.1093/bioinformatics/btv421

SUPER-FOCUS (2015)

  1. https://edwards.sdsu.edu/SUPERFOCUS
  2. Silva, G.G.Z., Green, K.T., Dutilh, B.E., Edwards, R.A. (2015). SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data. Bioinformatics 1-8

doi: 10.1093/bioinformatics/btv584 

SEARS (2015)

  1. http://computing.bio.cam.ac.uk/sear/SEAR_WEB_PAGE/SEAR.html
  2. Rowe, W. et al. (2015). Search Engine for Antimicrobial Resistance: A Cloud Compatible Pipeline and Web Interface for Rapidly Detecting Antimicrobial Resistance Genes Directly from Sequence Data. PLoS ONE 10(7): e0133492. doi:10.1371/journal.pone.0133492

KvarQ (2014)

  1. http://www.swisstph.ch/kvarq
  2. Steiner A., Stucki D., Coscolla, M., Borrell S, Gagneux S. (2014). KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes BMC Genomics 15:881 doi: 10.1186/1471-2164-15-881

kSNP v2 (2013)

  1. http://sourceforge.net/projects/ksnp/
  2. Gardner, S.N. & Hall, B.G. (2013) When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes. PLoS ONE 8(12): e81760. doi:10.1371/journal.pone.0081760

PICRUSt (2013)

  1. http://picrust.github.io/picrust/
  2. Langille, M.G.I et al., (2013). Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnology 31(9):814-821. doi:10.1038/nbt.2676

SmashCommunity (2010)

  1. http://www.bork.embl.de/software/smash
  2. Manimozhiyan A., Harrington E.D., Foerstner, K.U., Raes J. and Bork, P. (2010). SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics 26 (23):2977-2978.  doi:10.1093/bioinformatics/btq536

MG-RAST (2008)

  1. http://metagenomics.anl.gov/
  2. Meyer, F. et al. (2008) The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386 doi:10.1186/1471-2105-9-386

SINGLE ISOLATE

The RAST Server (2008)

  1. http://rast.nmpdr.org/
  2. Aziz R.K. et al., (2008). The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9:75 doi:10.1186/1471-2164-9-75

METAGENOMICS BENCHMARKING STUDIES

[[work in progress]] thanks to those who suggested adding this…

  1. Vollmers, J. Wiegand, S. & Kasdter, A-K. (2017) Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters! PLoS One 12(1): e0169662. https://doi.org/10.1371/journal.pone.0169662.
  2. A comparative study of metagenomics analysis pipelines at the species level. Yee Voan Teo, Nicola Neretti doi: https://doi.org/10.1101/081141 
  3. Siegwald L, Touzet H, Lemoine Y, Hot D, Audebert C, Caboche S. Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics. PLoS One. 2017 Jan 4;12(1):e0169563. doi: 10.1371/journal.pone.0169563
  4. Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software. http://biorxiv.org/content/early/2017/01/09/099127.article-metrics
  5. Peabody, M. A., Van Rossum, T., Lo, R., & Brinkman, F. S. L. (2015). Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformatics, 16(1), 363. doi:10.1186/s12859-015-0788-5
  6. Lindgreen, S., Adair, K. L. & Gardner, P. P. An evaluation of the accuracy and speed of metagenome analysis tools. Sci Rep. (2016). http://www.nature.com/articles/srep19233
  7. Critical Assessment of Metagenomic Interpretation (2015) http://www.cami-challenge.org/faq 
  8. Oulas, A. et al. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform. Biol. Insights 9, 75–88 (2015). http://dx.doi.org/10.4137%2FBBI.S12462
  9. Garcia-Etxebarria K, Garcia-Garcerà M, Calafell F (2014) Consistency of metagenomic assignment programs in simulated and real data. BMC Bioinformatics.
  10. Sun, Y. et al. A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Brief. Bioinform. 13, 107–21 (2012).
  11. Bazinet, A. L. & Cummings, M. P. A comparative evaluation of sequence classification programs. BMC Bioinformatics 13, 92 (2012).
  12. Martin, J., Sykes, S., Young, S., Kota, K., Sanka, R., Sheth, N., Orvis, J., Soder- gren, E., Wang, Z., Weinstock, G. M., and Mitreva, M. (2012). Optimizing Read Mapping to Reference Genomes to Determine Composition and Species Prevalence in Microbial Communities. PLoS ONE, 7(6):e36427.
  13. Awad, S., Irber, L., & Brown, C. T. (2017). Evaluating Metagenome Assembly on a Simple Defined Community with Many Strain Variants. bioRxiv. http://doi.org/https://doi.org/10.1101/155358

COMPARATIVE METAGENOMICS

  1. Jing G, Sun Z, Wang H, Gong Y, Huang S, Ning K, Xu J, Su X. Parallel-META 3:
    Comprehensive taxonomical and functional analysis platform for efficient
    comparison of microbial communities
    . Sci Rep. 2017 Jan 12;7:40371. doi:
    10.1038/srep40371. PubMed PMID: 28079128; PubMed Central PMCID: PMC5227994.
  2. Ban Y, An L, Jiang H. Investigating microbial co-occurrence patterns based on metagenomic compositional data. Bioinformatics. 2015 Oct 15;31(20):3322-9. doi: 10.1093/bioinformatics/btv364. Epub 2015 Jun 16. PubMed PMID: 26079350; PubMed Central PMCID: PMC4795632.
  3. Ondov BD, Treangen TJ, Mallonee AB, Bergman NH, Koren S, Phillippy AM. “Fast genome and metagenome distance estimation using MinHash”, doi: http://dx.doi.org/10.1101/029827 
  1. http://mash.readthedocs.org/en/latest/  MASH (2015)
  1. McMurdie, P. J. & Holmes, S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10, e1003531 (2014).
  2. Paulson, J. N., Stine, O. C., Bravo, H. C. & Pop, M. Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10, 1200–2 (2013).
  3. Evans, S. N. & Matsen, F. A. The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples. J. R. Stat. Soc. Ser. B Stat. Methodol. 74, 569–592 (2012).
  1. http://matsen.fhcrc.org/pplacer/  (see “guppy” tool)
  2. https://liorpachter.wordpress.com/2013/09/18/unifrac-revealed/#more-471 
  1. Huson, D. H., Richter, D. C., Mitra, S., Auch, A. F., and Schuster, S. C. (2009). Methods for comparative metagenomics. BMC bioinformatics, 10(Suppl 1):S12.
  2. Rodriguez-Brito, B., Rohwer, F., and Edwards, R. A. (2006). An application of statistics to comparative metagenomics. BMC bioinformatics, 7(1):162.
  3. Tringe, S. G., Von Mering, C., Kobayashi, A., Salamov, A. A., Chen, K., Chang, H. W., Podar, M., Short, J. M., Mathur, E. J., Detter, J. C., et al. (2005). Comparative metagenomics of microbial communities. Science, 308(5721):554–557.

METAGENOMICS REFERENCE DATASETS

too few of these exist

  1. Critical Assessment of Metagenomic Interpretation (2015) http://www.cami-challenge.org/faq 
  2. Mende, D. R., Waller, A. S., Sunagawa, S., Jrvelin, A. I., Chan, M. M., Aru- mugam, M., Raes, J., and Bork, P. (2012). Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data. PLoS ONE, 7(2):e31386.
  3. Bokulich, N. A., Rideout, J. R., Mercurio, W. G., Wolfe, B., Maurice, C. F., Dutton, R. J., ... & Caporaso, J. G. (2016). mockrobiota: a public resource for microbiome bioinformatics benchmarking (No. e2065v1). PeerJ Preprints.
  4. Shakya, M., Quince, C., Campbell, J. H., Yang, Z. K., Schadt, C. W. and Podar, M. (2013), Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. Environ Microbiol, 15: 1882–1899. doi:10.1111/1462-2920.12086

MICROBIOME / METAGENOMICS STANDARDS

Alliances

  1. https://microbialstandards.org
  2. http://www.microbiome-standards.org/#
  3. The Microbiome Quality Control project https://www.mbqc.org
  4. Genomics Standards Consortium https://press3.mcs.anl.gov/gensc/
  5.  

Papers

[[articles for genomics, metagenomics and/or microbial forensics standards]]

  1. Sinha, Rashmi, et al. "The microbiome quality control project: baseline study design and future directions." Genome biology 16.1 (2015): 276.
  1. The Microbiome Quality Control project http://www.mbqc.org 
  1. Budowle, B. et al. Validation of high throughput sequencing and microbial forensics applications. Investig. Genet. 5, 9 (2014).
  2. J. T. Ladner et al., Standards for sequencing viral genomes in the era of high-throughput sequencing. MBio. 5, e01360–14 (2014).

Metagenomics Assembly Tools

  1. Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P.A. (2017). metaSPAdes: a new versatile metagenomic assembler. Genome Research 27(5):824-834, doi:10.1101/gr.213959.116.
  2. Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A, Pevzner PA. plasmidSPAdes:
    assembling plasmids from whole genome sequencing data.
     Bioinformatics. 2016 Nov 15;32(22):3380-3387. Epub 2016 Jul 27. PubMed PMID: 27466620. https://doi.org/10.1093/bioinformatics/btw493 
  1. http://spades.bioinf.spbau.ru/plasmidSPAdes/ 
  1. Afiahayati,Sato, K., & Sakakibara., Y. (2015). MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Research 22(1):69-77, doi: 10.1093/dnares/dsu041
  2. Luo, C. et al. ConStrains identifies microbial strains in metagenomic datasets. Nat Biotech advance on, (2015). http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.3319.html

https://bitbucket.org/luo-chengwei/constrains 

  1. Cleary et al.. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nature Biotechnology 33, 1053–1060 (2015) doi:10.1038/nbt.3329
  1. Latent Strain Analysis (2015)
  2. http://latentstrainanalysis.readthedocs.org/en/latest/
  3. Looks at covariance relationships between k-mers.
  1. Li D. et al, MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. (2015)

[Contig or Otherwise] Deduplication tools

Thanks Ann Gregory for adding this section!

  1. Olm, M.R., Brown, C.T., Brooks, B., Banfield, J.F. (2017). dRep: A tool for fast and accurate genome de-replication that enables tracking of microbial genotypes and improved genome recovery from metagenomes. bioRxiv (preprint). 108142. DOI: 10.1101/108142

Commercial

Probably missed some…  probably should start breaking these into separate categories… but… holy crap that’s a lot of VC funding....         

Below is a list of _commerical_ organizations that produce or are developing  therapeutics, diagnostics, B2B and B2C products or services specialized in the microbiome space.

  1. Advanced Biological Laboratories (ABL)
  2. American Gut Project
  3. AppliedMaths
  4. Ardigen
  5. AsiaBiome
  6. Astarte Medical
  7. AOBiome
  8. Aperiomics
  9. Biome Bliss
  10. Bio-Me
  11. The BioCollective
  12. CD Genomics
  13. CeMeT GmbH
  14. ChunLabs
  15. Clinical Metagenomics A/S
  16. CosmosID
  17. DayTwo
  18. Diversigen
  19. Dupont Nutrition & Health
  20. Eligo Bioscience
  21. Enterome Bioscience
  22. Ferring Therapeutics
  23. Floragraph
  24. FryLabs
  25. IDbyDNA
  26. ID Genomics
  27. Kaleido Biosciences
  28. Karius
  29. Maat Pharma
  30. Metagenome Analytics, LLC
  31. Metabiota
  32. Metabolon
  33. Microbiome Insights
  34. MicroBiome Therapeutics
  35. MicrogenDX
  36. NatureMetrics
  37. Noblis
  38. Noscendo
  39. One Codex
  40. OpenBiome
  41. QIAGEN
  42. Phylagen 
  43. RealTime Genomics
  44. ReBiotix
  45. Reckitt Benckser
  46. Second Genome
  47. Seres Therapeutics
  48. Shoreline Biome
  49. Signature Science
  50. Siolta Therapeutics
  51. Taconic
  52. Takeda Pharmaceuticals
  53. Thryve
  54. TGEN
  55. uBiome
  56. Vedanta Biosciences
  57. Viome 
  58. WholeBiome 

MISC. Other stuff below…

BIOSURVEILLANCE / (Meta)GENOMICS REVIEW ARTICLES

This (growing) list of relevant review articles for the use of metagenomics / genomics in biosurveillance and/or clinical diagnostics. [[NEEDS TO BE UPDATED!!!]

Pavian (2016)

  1. Breitwieser FP, Salzberg SL Pavian: Interactive analysis of metagenomics data for microbiomics and pathogen identification bioRxiv 084715; doi: https://doi.org/10.1101/084715 (2016)
  2. Lefterova, M. I., Suarez, C. J., Banaei, N. & Pinsky, B. A. Next-Generation Sequencing for Infectious Disease Diagnosis and Management A Report of the Association for Molecular Pathology. J. Mol. Diagnostics 17, (2015).
  3. Franzosa, E. a. et al. Sequencing and beyond: integrating molecular ‘omics’ for microbial community profiling. Nat. Rev. Microbiol. 13, 360–372 (2015).
  4. Madoff, L.C., and Li, A. (2014). Web-Based Surveillance Systems for Human, Animal and Plant Diseases. Microbiol. Spectr. 2, OH–0015–2012.
  5. Lipkin, W.I. (2013). The changing face of pathogen discovery and surveillance. Nat. Rev. Microbiol. 11, 133–141.
  6. Tegos, G.P. (2013). Biodefense: trends and challenges in combating biological warfare agents. Virulence 4, 740–744.
  7. Valdivia-Granda, W. a (2013). Biosurveillance enterprise for operational awareness, a genomic-based approach for tracking pathogen virulence. Virulence 4, 745–751.
  8. Miller, R.R., Montoya, V., Gardy, J.L., Patrick, D.M., and Tang, P. (2013). Metagenomics for pathogen detection in public health. Genome Med. 5, 81.
  9. Kaydos-Daniels, S.C., Rojas Smith, L., and Farris, T.R. (2013). Biosurveillance in outbreak investigations. Biosecur. Bioterror. 11, 20–28.
  10. Kman, N.E., and Bachmann, D.J. (2012). Biosurveillance: a review and update. Adv. Prev. Med. 2012, 301408.
  11. Russell, K.L., Rubenstein, J., Burke, R.L., Vest, K.G., Johns, M.C., Sanchez, J.L., Meyer, W., Fukuda, M.M., and Blazes, D.L. (2011). The Global Emerging Infection Surveillance and Response System (GEIS), a U.S. government tool for improved global biosurveillance: a review of 2009. BMC Public Health 11 Suppl 2, S2.

GENOMICS DATA COMPRESSION / STREAMING

When gzip just isn’t enough… . Also see: http://omictools.com/data-compression-c383-p1.html 

  1. Roguski, Ł., & Ribeca, P. (2015). CARGO: Effective format-free compressed storage of genomic information. Retrieved from http://arxiv.org/abs/1506.05185
  2. Y. Zhang et al., Light-weight reference-based compression of FASTQ data. BMC Bioinformatics. 16, 188 (2015).
  3. S. Pathak, S. Rajasekaran, LFQC: a lossless compression algorithm for FASTQ files. Bioinformatics (2014), doi:10.1093/bioinformatics/btu701.
  4. J. K. Bonfield, M. V. Mahoney, Compression of FASTQ and SAM format sequencing data. PLoS One. 8, e59190 (2013).
  5. (REVIEW OF PRIOR METHODS) S. Deorowicz, S. Grabowski, Data compression for sequencing data. Algorithms Mol. Biol. 8, 25 (2013).