However, most analyses (at least preliminary ones) follow the same four-step procedure: We start with a data set, including both genetic data and annotations (such as location, language or subspecies). In most cases, only a subset of the data is analyzed, either because we want to study a region of interest, or due to quality filters. For each analysis (such as PCA, or structure), there are a number of performance parameters that can be varied for each run. Finally, tables and figures need to be produced.
While some standards (plink, vcf, glf, ) exist for the genetic data, additional data that is often useful such as sampling location, language, ecotype, population, etc. is not standardized, and each tool will require its own input format. Established interfaces could streamline this process. This becomes worse as it is often necessary to investigate the impact of QC-choices, analyze subsets of the data, or add new data to an analysis.While many labs will have set-up their own pipeline for this purpose, standardization has a few benefits: First, the impact of bugs may be reduced. Second, the barrier of entry gets removed. Third, alternative methods that are similar (e.g. tess3 vs. admixture vs. structure) can easily be compared, and the most useful method for a given data set could be applied.
Benefits for method developers:
Standards allow for easier performance comparison. In the spirit of dynamic statistical comparison (https://stephenslab.github.io/dsc-wiki/), easy access to multiple methods make it easy to investigate the behavior of new methods in a wide array of circumstances. It will also facilitate the adoption of new methods, if they are easily accessible in a framework that is already used.
My workflow is managed using Snakemake (http://snakemake.readthedocs.io), which, while based on python, allows usage of bash, python and R (and other languages) at any stage, thus the framework is largely language agnostic.An skeleton is available under https://github.com/BenjaminPeter/eems-around-the-world-draft ,