MINAS Round One Consensus Summary

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q
1	Structured comment name	Item (rdfs:label)	Definition	Expected value	Value syntax	Example	Section	MINAS Microbiome	MINAS Pathogen	MINAS SedaDNA	MINAS FINAL Consensus	Preferred unit	Occurence	MIXS ID	MINAS Microbiome Comment	MINAS Pathogen Comment	MINAS sedaDNA Comment
2	samp_name	sample name	A local identifier or name that for the material sample used for extracting nucleic acids, and subsequent sequencing. It can refer either to the original material collected or to any derived sub-samples. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. INSDC requires every sample name from a single Submitter to be unique. Use of a globally unique identifier for the field source_mat_id is recommended in addition to sample_name.	text	{text}	ISDsoil1	investigation	M	M	M	M		1	MIXS:0001107
3	samp_taxon_id	Taxonomy ID of DNA sample	NCBI taxon id of the sample. Maybe be a single taxon or mixed taxa sample. Use 'synthetic metagenome’ for mock community/positive controls, or 'blank sample' for negative controls.	Taxonomy ID	{text} [NCBI:txid]	Gut Metagenome [NCBI:txid749906]	investigation	M	C/X	M	CONFLICT		1	MIXS:0001320		Problem: SG and capture data contain many microbes, not just studied microbe
4	project_name	project name	Name of the project within which the sequencing was organized		{text}	Forest soil metagenome	investigation	M	M	M	M		1	MIXS:0000092	Guidance: Name of the first project the sample is associated to		Guidance: Name of the first project the sample is associated to
5	lat_lon	geographic location (latitude and longitude)	The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system	decimal degrees, limit to 8 decimal points	{float} {float}	50.586825 6.408977	environment	C	C	C	C		1	MIXS:0000009	Condition: at least one of lat long or geo_loc_name	Condition: excavation/recovery coordinates known	Condition: at least one of lat long or geo_loc_name
6	depth	depth	The vertical distance below local surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectively. Depth can be reported as an interval for subsurface samples.	measurement value	{float} {unit}	10 meter	environment	-	C	-	CONFLICT				Condition: excavation/recovery depth known	Condition: excavation/recovery depth known
7	elev	elevation	Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit.	measurement value	{float} {unit}	100 meter	environment	X	C	X	CONFLICT		1	MIXS:0000093	Condition: for mummy samples or rock shelters (can be inferred from map/topology)	Condition: excavation/recovery coordinates known	Condition: for mummy samples or rock shelters (can be inferred from map/topology)
8	temp	temperature	Temperature of the sample at the time of sampling.	measurement value	{float} {unit}	25 degree Celsius	environment	-	C	-	CONFLICT					Modified meaning: storage temperature and duration; Condition: storage temperature and duration known
9	geo_loc_name	geographic location (country and/or sea,region)	The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (http://purl.bioontology.org/ontology/GAZ)	country or sea name (INSDC or GAZ): region(GAZ), specific location name	{term}: {term}, {text}	USA: Maryland, Bethesda	environment	C	M	C	CONFLICT		1	MIXS:0000010	Condition: either lat long or geo_loc_name	Condition: excavation/recovery depth known	Condition: either lat long or geo_loc_name
10	collection_date	collection date	The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant	date and time	{timestamp}	2018-05-11T10:00:00+01:00; 2018-05-11	environment	C	C	C	C		1	MIXS:0000011	Problem: see comments regarding the different dates 'entry'. However we feel can be further defined as - Guidance: removal from main body of sample, later used for extraction; drilling; removal of calculus from tooth; or if not direct sampling from collection, date of removal from collection to take to lab)	Problem: need excavation and collection date; timestamp format not applicable to many excavations	Problem: see comments regarding the different dates 'entry'. However we feel can be further defined as - Guidance: removal from main body of sample, later used for extraction; drilling; removal of calculus from tooth; or if not direct sampling from collection, date of removal from collection to take to lab)
11	neg_cont_type	negative control type	The substance or equipment used as a negative control in an investigation	enumeration or text	[distilled water\|phosphate buffer\|empty collection device\|empty collection tube\|DNA-free PCR mix\|sterile swab \|sterile syringe]		investigation	M	C	M	CONFLICT		1	MIXS:0001321		Condition: use of negative control
12	pos_cont_type	positive control type	The substance, mixture, product, or apparatus used to verify that a process which is part of an investigation delivers a true positive.		{term} or {text}		investigation	-	C/X	-	CONFLICT					Condition: use of positive control
13	env_broad_scale	broad-scale environmental context	Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS	The major environment type(s) where the sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes.	{termLabel} {[termID]}	oceanic epipelagic zone biome [ENVO:01000033] for annotating a water sample from the photic zone in middle of the Atlantic Ocean	environment	X	C/X	X	X		1	MIXS:0000012	Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.	Condition: coordinates known; Problem: landscapes change through time	Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.
14	env_local_scale	local environmental context	Report the entity or entities which are in the sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS.	Environmental entities having causal influences upon the entity at time of sampling.	{termLabel} {[termID]}	litter layer [ENVO:01000338]; Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]\|herb and fern layer [ENVO:01000337]\|litter layer [ENVO:01000338]\|understory [01000335]\|shrub layer [ENVO:01000336].	environment	X	C/X	X	X		1	MIXS:0000013	Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.	Guidance: anatomical site sampled	Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.
15	env_medium	environmental medium	Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS . Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top).	The material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483].	{termLabel} {[termID]}	soil [ENVO:00001998]; Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]\|air [ENVO_00002005]	environment	X	C/X	X	X		1	MIXS:0000014	Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.	Modified meaning: burial environment (exp. directly interred, stone tomb, cave)	Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.
16	subspecf_gen_lin	subspecific genetic lineage	Information about the genetic distinctness of the sequenced organism below the subspecies level, e.g., serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123.	Genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype, variety, cultivar.	{rank name}:{text}	serovar:Newport	nucleic acid sequence source	-	C	-	CONTEXT SPECIFIC					Problem: SG and capture data contain many microbes, not just studied microbe
17	ploidy	ploidy	The ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATO	PATO	{termLabel} {[termID]}	allopolyploidy [PATO:0001379]	nucleic acid sequence source	-	C/X	-	CONTEXT SPECIFIC					Problem: SG and capture data contain many microbes, not just studied microbe
18	num_replicons	number of replicons	Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote	for eukaryotes and bacteria: chromosomes (haploid count); for viruses: segments	{integer}	2	nucleic acid sequence source	-	X	-	CONTEXT SPECIFIC					Problem: SG and capture data contain many microbes, not just studied microbe
19	extrachrom_elements	extrachromosomal elements	Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids)	number of extrachromosmal elements	{integer}	5	nucleic acid sequence source	-	X	-	CONTEXT SPECIFIC					Problem: SG and capture data contain many microbes, not just studied microbe
20	ref_biomaterial	reference for biomaterial	Primary publication if isolated before genome publication; otherwise, primary genome report.	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	doi:10.1016/j.syapm.2018.01.009	nucleic acid sequence source	-	C	-	CONFLICT					Condition: sample has been sequenced before
21	source_mat_id	source material identifiers	A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2).	for cultures of microorganisms: identifiers for two culture collections; for other material a unique arbitrary identifer	{text}	MPI012345	nucleic acid sequence source	M	M	M	M		m	MIXS:0000026	Guidance: ID should refer to the institute of origin of the sample (archaeological or Museum code) NOT the project lab code (exceptions: blanks)	Guidance: archaeological or museum ID	Guidance: ID should refer to the institute of origin of the sample (archaeological or Museum code) NOT the project lab code (exceptions: blanks)
22	specific_host	host scientific name	Report the host's taxonomic name and/or NCBI taxonomy ID.	host scientific name, taxonomy ID	{text}\|{NCBI taxid}	Homo sapiens and/or 9606	nucleic acid sequence source	M	M	M	M		1	MIXS:0000029
23	host_disease_stat	host disease status	List of diseases with which the host has been diagnosed; can include multiple diagnoses. The value of the field depends on host; for humans the terms should be chosen from the DO (Human Disease Ontology) at https://www.disease-ontology.org, non-human host diseases are free text	disease name or Disease Ontology term	{termLabel} {[termID]}\|{text}	rabies [DOID:11260]	nucleic acid sequence source	X	X	X	X		m	MIXS:0000031	Condition: Should be given if observed, but this depends on type and completeness of remains (yes calcus, but if more than just single tooth; can't be done for palaeofaeces)	Problem: decide if this should refer to skeleton pathology or all pathogen hits in the data	Condition: Should be given if observed, but this depends on type and completeness of remains (yes calcus, but if more than just single tooth; can't be done for palaeofaeces)
24	samp_collec_method	sample collection method	The method employed for collecting the sample.	PMID,DOI,url , or text	{PMID}\|{DOI}\|{URL}\|{text}	swabbing	nucleic acid sequence source	M	M	M	M		1	MIXS:0001225		Guidance: DOI or description of sampling method; be mindful of replicate protocol fields
25	samp_mat_process	sample material processing	A brief description of any processing applied to the sample during or after retrieving the sample from environment, or a link to the relevant protocol(s) performed.	text	{text}	filtering of seawater, storing samples in ethanol	nucleic acid sequence source	X	-	X	CONFLICT		1	MIXS:0000016	Problem: See comments re storage conditions and treatment: we may propose to expand these into multiple fields	Problem: SG and capture data contain many microbes, not just studied microbe	Problem: See comments re storage conditions and treatment: we may propose to expand these into multiple fields
26	size_frac	size fraction selected	Filtering pore size used in sample preparation	filter size value range	{float}-{float} {unit}	0-0.22 micrometer	nucleic acid sequence source	X	-	X	CONFLICT		1	MIXS:0000017		Problem: SG and capture data contain many microbes, not just studied microbe
27	samp_size	amount or size of sample collected	The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected.	measurement value	{float} {unit}	5 liter	nucleic acid sequence source	M	-	M	CONFLICT	millliter, gram, milligram, liter	1	MIXS:0000001	Problem: Require a value for 'no detection' - calculus can be unweighable	Condition: sample has been sequenced before	Problem: Require a value for 'no detection' - calculus can be unweighable
28	samp_vol_we_dna_ext	sample volume or weight for DNA extraction	Volume (ml) or mass (g) of total collected sample processed for DNA extraction. Note: total sample collected should be entered under the term Sample Size (MIXS:0000001).	measurement value	{float} {unit}	1500 milliliter	nucleic acid sequence source	M	M	M	M	millliter, gram, milligram, square centimeter	1	MIXS:0000111	Problem: Require a value for 'no detection' - calculus can be unweighable	Guidance: mass of tissue used for extraction	Problem: Require a value for 'no detection' - calculus can be unweighable
29	virus_enrich_appr	virus enrichment approach	List of approaches used to enrich the sample for viruses, if any	enumeration	[filtration\|ultrafiltration\|centrifugation\|ultracentrifugation\|PEG Precipitation\|FeCl Precipitation\|CsCl density gradient\|DNAse\|RNAse\|targeted sequence capture\|other\|none]	filtration + FeCl Precipitation + ultracentrifugation + DNAse	nucleic acid sequence source	-	?	-	CONFLICT					Modified meaning: any enrichment approached used on dataset, not just for viruses
30	nucl_acid_ext	nucleic acid extraction	A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf	sequencing	M	M	M	M		1	MIXS:0000037
31	nucl_acid_amp	nucleic acid amplification	A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	https://phylogenomics.me/protocols/16s-pcr-protocol/	sequencing	M	C	M	CONFLICT		1	MIXS:0000038		Condition: PCR-based study
32	lib_reads_seqd	library reads sequenced	Total number of clones sequenced from the library	number of reads sequenced	{integer}	20	sequencing	-	M	-	CONFLICT					Guidance: number of reads sequenced, not clones
33	lib_layout	library layout	Specify whether to expect single, paired, or other configuration of reads	enumeration	[paired\|single\|vector\|other]	paired	sequencing	M	M	M	M		1	MIXS:0000041
34	lib_screen	library screening strategy	Specific enrichment or screening methods applied before and/or after creating libraries	screening strategy name	{text}	enriched, screened, normalized	sequencing	C	M	C	CONFLICT		1	MIXS:0000043		Question: there is already a field for this in general upload. Do we need duplicate entries?
35	target_gene	target gene	Targeted gene or locus name for marker gene studies	gene name	{text}	16S rRNA, 18S rRNA, nif, amoA, rpo	sequencing	-	C	-	CONTEXT SPECIFIC					Condition: PCR-based study
36	target_subfragment	target subfragment	Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA	gene fragment name	{text}	V6, V9, ITS	sequencing	-	C	-	CONTEXT SPECIFIC					Condition: PCR-based study
37	pcr_primers	pcr primers	PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters	FWD: forward primer sequence;REV:reverse primer sequence	FWD:{dna};REV:{dna}	FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAAT	sequencing	-	C	-	CONTEXT SPECIFIC					Condition: PCR-based study
38	mid	multiplex identifiers	Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters	multiplex identifier sequence	{dna}	GTGAATAT	sequencing	M	X	M	CONFLICT		1	MIXS:0000047
39	adapters	adapters	Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters	adapter A and B sequence	{dna};{dna}	AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT	sequencing	M	X	M	CONFLICT		1	MIXS:0000048	Problem: adapters meant to be removed already if uploading to ENA; but can we trust that?		Problem: adapters meant to be removed already if uploading to ENA; but can we trust that?
40	pcr_cond	pcr conditions	Description of reaction conditions and components of PCR in the form of 'initial denaturation:94degC_1.5min; annealing=...'	initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles	initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles	initial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35	sequencing	-	C	-	CONFLICT					Condition: PCR-based study
41	seq_meth	sequencing method	Sequencing machine used. Where possible the term should be taken from the OBI list of DNA sequencers (http://purl.obolibrary.org/obo/OBI_0400103).	Text or OBI	{termLabel} {[termID]}\|{text}	454 Genome Sequencer FLX [OBI:0000702]	sequencing	M	M	M	M		1	MIXS:0000050
42	chimera_check	chimera check software	Tool(s) used for chimera checking, including version number and parameters, to discover and remove chimeric sequences. A chimeric sequence is comprised of two or more phylogenetically distinct parent sequences.	name and version of software, parameters used	{software};{version};{parameters}	uchime;v4.1;default parameters	sequencing	-	C	-	CONTEXT SPECIFIC					Condition: performed chimera assessment
43	tax_ident	taxonomic identity marker	The phylogenetic marker(s) used to assign an organism name to the SAG or MAG	enumeration	[16S rRNA gene\|multi-marker approach\|other]	other: rpoB gene	sequencing	C	X	C	CONTEXT SPECIFIC		1	MIXS:0000053	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
44	assembly_qual	assembly quality	The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated	enumeration	[Finished genome\|High-quality draft genome\|Medium-quality draft genome\|Low-quality draft genome\|Genome fragment(s)]	High-quality draft genome	sequencing	C	C	C	CONTEXT SPECIFIC		1	MIXS:0000056	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: assembly data	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
45	assembly_name	assembly name	Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community	name and version of assembly	{text} {text}	HuRef, JCVI_ISG_i3_1.0	sequencing	C	C	C	CONTEXT SPECIFIC		1	MIXS:0000057	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: assembly data	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
46	assembly_software	assembly software	Tool(s) used for assembly, including version number and parameters	name and version of software, parameters used	{software};{version};{parameters}	metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise	sequencing	C	C	C	CONTEXT SPECIFIC		1	MIXS:0000058	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: assembly data	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
47	annot	annotation	Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter	name of tool or pipeline used, or annotation source description	{text}	prokka	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000059	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: annotated assembly	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
48	number_contig	number of contigs	Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG	value	{integer}	40	sequencing	C	C	C	CONTEXT SPECIFIC		1	MIXS:0000060	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: assembly data	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
49	feat_pred	feature prediction	Method used to predict UViGs features such as ORFs, integration site, etc.	names and versions of software(s), parameters used	{software};{version};{parameters}	Prodigal;2.6.3;default parameters	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000061	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: assembly data	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
50	ref_db	reference database(s)	List of database(s) used for ORF annotation, along with version number and reference to website or publication	names, versions, and references of databases	{database};{version};{reference}	pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000062	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: assembly data	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
51	sim_search_meth	similarity search method	Tool used to compare ORFs with database, along with version and cutoffs used	names and versions of software(s), parameters used	{software};{version};{parameters}	HMMER3;3.1b2;hmmsearch, cutoff of 50 on score	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000063	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: assembly data	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
52	tax_class	taxonomic classification	Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes	classification method, database name, and other parameters	{text}	vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters)	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000064	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of taxonomic classification are not observable in raw data upload; is this a necessary field?	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
53	16s_recover	16S recovered	Can a 16S gene be recovered from the submitted SAG or MAG?	boolean	{boolean}	yes	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000065	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: 16S rRNA data	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
54	16s_recover_software	16S recovery software	Tools used for 16S rRNA gene extraction	names and versions of software(s), parameters used	{software};{version};{parameters}	rambl;v2;default parameters	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000066	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: 16S rRNA data; Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
55	trnas	number of standard tRNAs extracted	The total number of tRNAs identified from the SAG or MAG	value from 0-21	{integer}	18	sequencing	C	-	C	CONTEXT SPECIFIC		1	MIXS:0000067	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
56	trna_ext_software	tRNA extraction software	Tools used for tRNA identification	names and versions of software(s), parameters used	{software};{version};{parameters}	infernal;v2;default parameters	sequencing	C	-	C	CONTEXT SPECIFIC		1	MIXS:0000068	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
57	compl_score	completeness score	Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores	quality;percent completeness	[high\|med\|low];{percentage}	med;60%	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000069	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
58	compl_software	completeness software	Tools used for completion estimate, i.e. checkm, anvi'o, busco	names and versions of software(s) used	{software};{version}	checkm	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000070	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
59	compl_appr	completeness approach	The approach used to determine the completeness of a given genomic assembly, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome	text	[marker gene\|reference based\|other]	other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83)	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000071	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
60	contam_score	contamination score	The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases	value	{float} percentage	1.00%	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000072	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
61	contam_screen_input	contamination screening input	The type of sequence data used as input	enumeration	[reads\| contigs]	contigs	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000005	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
62	contam_screen_param	contamination screening parameters	Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer	enumeration;value or name	[ref db\|kmer\|coverage\|combination];{text\|integer}	kmer	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000073	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
63	decontam_software	decontamination software	Tool(s) used in contamination screening	enumeration	[checkm/refinem\|anvi'o\|prodege\|bbtools:decontaminate.sh\|acdc\|combination]	anvi'o	sequencing	C	X/C	C	CONTEXT SPECIFIC		1	MIXS:0000074	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
64	bin_param	binning parameters	The parameters that have been applied during the extraction of genomes from metagenomic datasets	enumeration	[homology search\|kmer\|coverage\|codon usage\|combination]	coverage and kmer	sequencing	C	C	C	CONTEXT SPECIFIC		1	MIXS:0000077	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
65	bin_software	binning software	Tool(s) used for the extraction of genomes from metagenomic datasets, where possible include a product ID (PID) of the tool(s) used.	names and versions of software(s) used	{software};{version}{PID}	MetaCluster-TA (RRID:SCR_004599), MaxBin (biotools:maxbin)	sequencing	C	C	C	CONTEXT SPECIFIC		1	MIXS:0000078	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
66	reassembly_bin	reassembly post binning	Has an assembly been performed on a genome bin extracted from a metagenomic assembly?	boolean	{boolean}	no	sequencing	C	C	C	CONTEXT SPECIFIC		1	MIXS:0000079	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
67	mag_cov_software	MAG coverage software	Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets	enumeration	[bwa\|bbmap\|bowtie\|other]	bbmap	sequencing	C	C	C	CONTEXT SPECIFIC		1	MIXS:0000080	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
68	vir_ident_software	viral identification software	Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used	software name, version and relevant parameters	{software};{version};{parameters}	VirSorter; 1.0.4; Virome database, category 2	sequencing	C	-	C	CONTEXT SPECIFIC		1	MIXS:0000081	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
69	pred_genome_type	predicted genome type	Type of genome predicted for the UViG	enumeration	[DNA\|dsDNA\|ssDNA\|RNA\|dsRNA\|ssRNA\|ssRNA (+)\|ssRNA (-)\|mixed\|uncharacterized]	dsDNA	sequencing	C	-	C	CONTEXT SPECIFIC		1	MIXS:0000082	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
70	pred_genome_struc	predicted genome structure	Expected structure of the viral genome	enumeration	[segmented\|non-segmented\|undetermined]	non-segmented	sequencing	C	-	C	CONTEXT SPECIFIC		1	MIXS:0000083	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
71	detec_type	detection type	Type of UViG detection	enumeration	[independent sequence (UViG)\|provirus (UpViG)]	independent sequence (UViG)	sequencing	C	-	C	CONTEXT SPECIFIC		1	MIXS:0000084	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
72	otu_class_appr	OTU classification approach	Cutoffs and approach used when clustering “species-level” OTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside OTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis	cutoffs and method used	{ANI cutoff};{AF cutoff};{clustering method}	95% ANI;85% AF; greedy incremental clustering	sequencing	C	X	C	CONTEXT SPECIFIC		1	MIXS:0000085	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
73	otu_seq_comp_appr	OTU sequence comparison approach	Tool and thresholds used to compare sequences when computing "species-level" OTUs	software name, version and relevant parameters	{software};{version};{parameters}	blastn;2.6.0+;e-value cutoff: 0.001	sequencing	C	X	C	CONTEXT SPECIFIC		1	MIXS:0000086	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
74	otu_db	OTU database	Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in "species-level" OTUs, if any	database and version	{database};{version}	NCBI Viral RefSeq;83	sequencing	C	X	C	CONTEXT SPECIFIC		1	MIXS:0000087	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!	Problem: results of analysis are not observable in raw data upload	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
75	host_pred_appr	host prediction approach	Tool or approach used for host prediction	enumeration	[provirus\|host sequence similarity\|CRISPR spacer match\|kmer similarity\|co-occurrence\|combination\|other]	CRISPR spacer match	sequencing	C	-	C	CONTEXT SPECIFIC		1	MIXS:0000088	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
76	host_pred_est_acc	host prediction estimated accuracy	For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature	false discovery rate	{text}	CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048)	sequencing	C	-	C	CONTEXT SPECIFIC		1	MIXS:0000089	Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!		Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
77	associated resource	relevant electronic resources	A related resource that is referenced, cited, or otherwise associated to the sequence.	reference to resource	{PMID} \| {DOI} \| {URL}	http://www.earthmicrobiome.org/	sequencing	C	X	C	CONFLICT		m	MIXS:0000091
78	sop	relevant standard operating procedures	Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences	reference to SOP	{PMID}\|{DOI}\|{URL}	http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/	sequencing	C	C	C	C		m	MIXS:0000090		Condition: assembly data; Problem: results of analysis are not observable in raw data upload
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100