MIxS 6 term updates

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V
1	Structured comment name	Item	Definition	Expected value	Value syntax	Example	Section	migs_eu	migs_ba	migs_pl	migs_vi	migs_org	me	mimarks_s	mimarks_c	misag	mimag	miuvig	Preferred unit	Occurence	Position	MIXS ID
2	submitted_to_insdc	submitted to insdc	Depending on the study (large-scale e.g. done with next generation sequencing technology, or small-scale) sequences have to be submitted to SRA (Sequence Read Archive), DRA (DDBJ Read Archive) or via the classical Webin/Sequin systems to Genbank, ENA and DDBJ. Although this field is mandatory, it is meant as a self-test field, therefore it is not necessary to include this field in contextual data submitted to databases	boolean	{boolean}	yes	investigation	M	M	M	M	M	M	M	M	M	M	M		1	1	MIXS:0000004
3	investigation_type	investigation type	Nucleic Acid Sequence Report is the root element of all MIGS/MIMS compliant reports as standardized by Genomic Standards Consortium. This field is either eukaryote,bacteria,virus,plasmid,organelle, metagenome,mimarks-survey, mimarks-specimen, metatranscriptome, single amplified genome, metagenome-assembled genome, or uncultivated viral genome	eukaryote, bacteria_archaea, plasmid, virus, organelle, metagenome,mimarks-survey, mimarks-specimen, metatranscriptome, single amplified genome, metagenome-assembled genome, or uncultivated viral genomes	[eukaryote\|bacteria_archaea\|plasmid\|virus\|organelle\|metagenome\|metatranscriptome\|mimarks-survey\|mimarks-specimen\|misag\|mimag\|miuvig]	metagenome	investigation	M	M	M	M	M	M	M	M	M	M	M		1	2	MIXS:0000007
4	project_name	project name	Name of the project within which the sequencing was organized		{text}	Forest soil metagenome	investigation	M	M	M	M	M	M	M	M	M	M	M		1	3	MIXS:0000092
5	experimental_factor	experimental factor	Experimental factors are essentially the variable aspects of an experiment design which can be used to describe an experiment, or set of experiments, in an increasingly detailed manner. This field accepts ontology terms from Experimental Factor Ontology (EFO) and/or Ontology for Biomedical Investigations (OBI). For a browser of EFO (v 2.95) terms, please see http://purl.bioontology.org/ontology/EFO; for a browser of OBI (v 2018-02-12) terms please see http://purl.bioontology.org/ontology/OBI	text or EFO and/or OBI	{termLabel} {[termID]}\|{text}	time series design [EFO:EFO_0001779]	investigation	X	X	X	X	X	C	C	X	C	C	C		1	4	MIXS:0000008
6	lat_lon	geographic location (latitude and longitude)	The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system	decimal degrees	{float} {float}	50.586825 6.408977	environment	M	M	M	M	M	M	M	M	M	M	M		1	5	MIXS:0000009
7	depth	geographic location (depth)	Please refer to the definitions of depth in the environmental packages	-	-		environment	E	E	E	E	E	E	E	E	E	E	E		0	6	MIXS:0000018
8	alt	altitude	Altitude is a term used to identify heights of objects such as airplanes, space shuttles, rockets, atmospheric balloons and heights of places such as atmospheric layers and clouds. It is used to measure the height of an object which is above the earth‚Äôs surface. In this context, the altitude measurement is the vertical distance between the earth's surface above sea level and the sampled position in the air	measurement value	{float} {unit}	100 meter	environment	E	E	E	E	E	E	E	E	E	E	E
9	elev	elevation	Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit	measurement value	{float} {unit}	100 meter	environment	E	E	E	E	E	E	E	E	E	E	E		0	7	MIXS:0000093
10	geo_loc_name	geographic location (country and/or sea,region)	The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (v 1.512) (http://purl.bioontology.org/ontology/GAZ)	country or sea name (INSDC or GAZ);region(GAZ);specific location name	{term};{term};{text}	Germany;North Rhine-Westphalia;Eifel National Park	environment	M	M	M	M	M	M	M	M	M	M	M		1	8	MIXS:0000010
11	collection_date	collection date	The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant	date and time	{timestamp}	2018-05-11T10:00:00+01:00	environment	M	M	M	M	M	M	M	M	M	M	M		1	9	MIXS:0000011
12	env_broad_scale	broad-scale environmental context	In this field, report which major environmental system your sample or specimen came from. The systems identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. were you in the desert or a rainforest?). We recommend using subclasses of ENVO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428. Format (one term): termLabel [termID], Format (multiple terms): termLabel [termID]\|termLabel [termID]\|termLabel [termID]. Example: Annotating a water sample from the photic zone in middle of the Atlantic Ocean, consider: oceanic epipelagic zone biome [ENVO:01000033]. Example: Annotating a sample from the Amazon rainforest consider: tropical moist broadleaf forest biome [ENVO:01000228]. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html	Add terms that identify the major environment type(s) where your sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes e.g.: mangrove biome [ENVO:01000181]\|estuarine biome [ENVO:01000020]	{termLabel} {[termID]}	forest biome [ENVO:01000174]	environment	M	M	M	M	M	M	M	M	M	M	M		1	10	MIXS:0000012
13	env_local_scale	local environmental context	In this field, report the entity or entities which are in your sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. Please use terms that are present in ENVO and which are of smaller spatial grain than your entry for env_broad_scale. Format (one term): termLabel [termID]; Format (multiple terms): termLabel [termID]\|termLabel [termID]\|termLabel [termID]. Example: Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]\|herb and fern layer [ENVO:01000337]\|litter layer [ENVO:01000338]\|understory [01000335]\|shrub layer [ENVO:01000336]. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html	Add terms that identify environmental entities having causal influences upon the entity at time of sampling, multiple terms can be separated by pipes, e.g.: shoreline [ENVO:00000486]\|intertidal zone [ENVO:00000316]	{termLabel} {[termID]}	litter layer [ENVO:01000338]	environment	M	M	M	M	M	M	M	M	M	M	M		1	11	MIXS:0000013
14	env_medium	environmental medium	In this field, report which environmental material or materials (pipe separated) immediately surrounded your sample or specimen prior to sampling, using one or more subclasses of ENVO’s environmental material class: http://purl.obolibrary.org/obo/ENVO_00010483. Format (one term): termLabel [termID]; Format (multiple terms): termLabel [termID]\|termLabel [termID]\|termLabel [termID]. Example: Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]\|air ENVO_00002005. If needed, request new terms on the ENVO tracker, identified here: http://www.obofoundry.org/ontology/envo.html	Add terms that identify the material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. Multiple terms can be separated by pipes e.g.: estuarine water [ENVO:01000301]\|estuarine mud [ENVO:00002160]	{termLabel} {[termID]}	soil [ENVO:00001998]	environment	M	M	M	M	M	M	M	M	M	M	M		1	12	MIXS:0000014
15	env_package	environmental package	MIxS extension for reporting of measurements and observations obtained from one or more of the environments where the sample was obtained. All environmental packages listed here are further defined in separate subtables. By giving the name of the environmental package, a selection of fields can be made from the subtables and can be reported	enumeration	[air\|built environment\|host-associated\|human-associated\|human-skin\|human-oral\|human-gut\|human-vaginal\|hydrocarbon resources-cores\|hydrocarbon resources-fluids/swabs\|microbial mat/biofilm\|misc environment\|plant-associated\|sediment\|soil\|wastewater/sludge\|water]	soil	mixs extension	C	C	C	C	C	C	C	C	C	C	C		1	13	MIXS:0000019
16	subspecf_gen_lin	subspecific genetic lineage	This should provide further information about the genetic distinctness of the sequenced organism by recording additional information e.g. serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. It can also contain alternative taxonomic information. It should contain both the lineage name, and the lineage rank, i.e. biovar:abc123	genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype	{rank name}:{text}	serovar:Newport	nucleic acid sequence source	C	C	C	C	C	-	-	C	-	-	-		1	14	MIXS:0000020
17	ploidy	ploidy	The ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATO	PATO	{termLabel} {[termID]}	allopolyploidy [PATO:0001379]	nucleic acid sequence source	X	-	-	-	-	-	-	-	-	-	-		1	15	MIXS:0000021
18	num_replicons	number of replicons	Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote	for eukaryotes and bacteria: chromosomes (haploid count); for viruses: segments	{integer}	2	nucleic acid sequence source	X	M	-	C	-	-	-	-	-	-	-		1	16	MIXS:0000022
19	extrachrom_elements	extrachromosomal elements	Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids)	number of extrachromosmal elements	{integer}	5	nucleic acid sequence source	X	C	-	-	C	-	-	X	-	-	-		1	17	MIXS:0000023
20	estimated_size	estimated size	The estimated size of the genome prior to sequencing. Of particular importance in the sequencing of (eukaryotic) genome which could remain in draft form for a long or unspecified period.	number of base pairs	{integer} bp	300000 bp	nucleic acid sequence source	X	X	X	X	X	-	-	-	-	-	X		1	18	MIXS:0000024
21	ref_biomaterial	reference for biomaterial	Primary publication if isolated before genome publication; otherwise, primary genome report	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	doi:10.1016/j.syapm.2018.01.009	nucleic acid sequence source	X	M	X	X	X	X	-	-	X	X	X		1	19	MIXS:0000025
22	source_mat_id	source material identifiers	A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2).	for cultures of microorganisms: identifiers for two culture collections; for other material a unique arbitrary identifer	{text}	MPI012345	nucleic acid sequence source	C	C	C	C	C	C	C	C	C	C	C		m	20	MIXS:0000026
23	pathogenicity	known pathogenicity	To what is the entity pathogenic	names of organisms that the entity is pathogenic to	{text}	human, animal, plant, fungi, bacteria	nucleic acid sequence source	C	C	-	C	-	-	-	-	-	-	X		1	21	MIXS:0000027
24	biotic_relationship	observed biotic relationship	Description of relationship(s) between the subject organism and other organism(s) it is associated with. E.g., parasite on species X; mutualist with species Y. The target organism is the subject of the relationship, and the other organism(s) is the object	enumeration	[free living\|parasitism\|commensalism\|symbiotic\|mutualism]	free living	nucleic acid sequence source	X	C	-	X	-	-	-	C	-	-	X		1	22	MIXS:0000028
25	specific_host	specific host	If there is a host involved, please provide its taxid (or environmental if not actually isolated from the dead or alive host - i.e. a pathogen could be isolated from a swipe of a bench etc) and report whether it is a laboratory or natural host)	host taxid, unknown, environmental	{NCBI taxid}\|{text}	9606	nucleic acid sequence source	X	C	C	C	-	-	-	-	-	-	X		1	23	MIXS:0000029
26	host_spec_range	host specificity or range	The NCBI taxonomy identifier of the specific host if it is known	NCBI taxid	{integer}	9606	nucleic acid sequence source	X	X	X	C	-	-	-	-	-	-	X		1	24	MIXS:0000030
27	health_disease_stat	health or disease status of specific host at time of collection	Health or disease status of specific host at time of collection	enumeration	[healthy\|diseased\|dead\|disease-free\|undetermined\|recovering\|resolving\|pre-existing condition\|pathological\|life threatening\|congenital]	dead	nucleic acid sequence source	X	C	-	C	-	-	-	-	-	-	-		1	25	MIXS:0000031
28	trophic_level	trophic level	Trophic levels are the feeding position in a food chain. Microbes can be a range of producers (e.g. chemolithotroph)	enumeration	[autotroph\|carboxydotroph\|chemoautotroph\|chemoheterotroph\|chemolithoautotroph\|chemolithotroph\|chemoorganoheterotroph\|chemoorganotroph\|chemosynthetic\|chemotroph\|copiotroph\|diazotroph\|facultative\|autotroph\|heterotroph\|lithoautotroph\|lithoheterotroph\|lithotroph\|methanotroph\|methylotroph\|mixotroph\|obligate\|chemoautolithotroph\|oligotroph\|organoheterotroph\|organotroph\|photoautotroph\|photoheterotroph\|photolithoautotroph\|photolithotroph\|photosynthetic\|phototroph]	heterotroph	nucleic acid sequence source	C	C	-	-	-	-	-	C	-	-	-		1	26	MIXS:0000032
29	propagation	propagation	This field is specific to different taxa. For phages: lytic/lysogenic, for plasmids: incompatibility group, for eukaryotes: sexual/asexual (Note: there is the strong opinion to name phage propagation obligately lytic or temperate, therefore we also give this choice	for virus: lytic, lysogenic, temperate, obligately lytic; for plasmid: incompatibility group; for eukaryote: asexual, sexual	{text}	lytic	nucleic acid sequence source	C	-	M	M	-	-	-	-	-	-	-		1	27	MIXS:0000033
30	encoded_traits	encoded traits	Should include key traits like antibiotic resistance or xenobiotic degradation phenotypes for plasmids, converting genes for phage	for plasmid: antibiotic resistance; for phage: converting genes	{text}	beta-lactamase class A	nucleic acid sequence source	-	X	C	C	-	-	-	-	-	-	-		1	28	MIXS:0000034
31	rel_to_oxygen	relationship to oxygen	Is this organism an aerobe, anaerobe? Please note that aerobic and anaerobic are valid descriptors for microbial environments	enumeration	[aerobe\|anaerobe\|facultative\|microaerophilic\|microanaerobe\|obligate aerobe\|obligate anaerobe]	aerobe	nucleic acid sequence source	-	C	-	-	-	X	X	C	X	X	-		1	29	MIXS:0000015
32	isol_growth_condt	isolation and growth condition	Publication reference in the form of pubmed ID (pmid), digital object identifier (doi) or url for isolation and growth condition specifications of the organism/material	PMID,DOI or URL	{PMID}\|{DOI}\|{URL}	doi: 10.1016/j.syapm.2018.01.009	nucleic acid sequence source	M	M	M	M	M	-	-	M	-	-	-		1	30	MIXS:0000003
33	samp_collect_device	sample collection device or method	The method or device employed for collecting the sample	type name	{text}	biopsy, niskin bottle, push core	nucleic acid sequence source	X	X	X	X	X	C	C	X	C	C	C		1	31	MIXS:0000002
34	samp_mat_process	sample material processing	Any processing applied to the sample during or after retrieving the sample from environment. This field accepts OBI, for a browser of OBI (v 2018-02-12) terms please see http://purl.bioontology.org/ontology/OBI	text or OBI	{text}\|{termLabel} {[termID]}	filtering of seawater, storing samples in ethanol	nucleic acid sequence source	X	X	X	X	X	C	C	C	C	C	C		1	32	MIXS:0000016
35	size_frac	size fraction selected	Filtering pore size used in sample preparation	filter size value range	{float}-{float} {unit}	0-0.22 micrometer	nucleic acid sequence source	-	-	-	-	-	X	X	-	X	X	C		1	33	MIXS:0000017
36	samp_size	amount or size of sample collected	Amount or size of sample (volume, mass or area) that was collected	measurement value	{float} {unit}	5 liter	nucleic acid sequence source	X	X	X	X	X	C	C	X	C	C	C	millliter, gram, milligram, liter	1	34	MIXS:0000001
37	source_uvig	source of UViGs	Type of dataset from which the UViG was obtained	enumeration	[metagenome (not viral targeted)\|viral fraction metagenome (virome)\|sequence-targeted metagenome\|metatranscriptome (not viral targeted)\|viral fraction RNA metagenome (RNA virome)\|sequence-targeted RNA metagenome\|microbial single amplified genome (SAG)\|viral single amplified genome (vSAG)\|isolate microbial genome\|other]	viral fraction metagenome (virome)	nucleic acid sequence source	-	-	-	-	-	-	-	-	-	-	M		1	35	MIXS:0000035
38	virus_enrich_appr	virus enrichment approach	List of approaches used to enrich the sample for viruses, if any	enumeration	[filtration\|ultrafiltration\|centrifugation\|ultracentrifugation\|PEG Precipitation\|FeCl Precipitation\|CsCl density gradient\|DNAse\|RNAse\|targeted sequence capture\|other\|none]	filtration + FeCl Precipitation + ultracentrifugation + DNAse	nucleic acid sequence source	-	-	-	C	-	-	-	-	-	-	M		1	36	MIXS:0000036
39	nucl_acid_ext	nucleic acid extraction	A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf	sequencing	C	C	C	C	C	C	C	C	C	C	C		1	37	MIXS:0000037
40	nucl_acid_amp	nucleic acid amplification	A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	https://phylogenomics.me/protocols/16s-pcr-protocol/	sequencing	C	C	C	C	C	C	C	C	C	C	C		1	38	MIXS:0000038
41	lib_size	library size	Total number of clones in the library prepared for the project	number of clones	{integer}	50	sequencing	X	X	X	X	X	C	C	-	C	C	C		1	39	MIXS:0000039
42	lib_reads_seqd	library reads sequenced	Total number of clones sequenced from the library	number of reads sequenced	{integer}	20	sequencing	X	X	X	X	X	C	C	-	C	C	C		1	40	MIXS:0000040
43	lib_layout	library layout	Specify whether to expect single, paired, or other configuration of reads	enumeration	[paired\|single\|vector\|other]	paired	sequencing	X	X	X	X	X	C	C	-	C	C	C		1	41	MIXS:0000041
44	lib_vector	library vector	Cloning vector type(s) used in construction of libraries	vector	{text}	Bacteriophage P1	sequencing	X	X	X	X	X	C	C	-	C	C	C		1	42	MIXS:0000042
45	lib_screen	library screening strategy	Specific enrichment or screening methods applied before and/or after creating libraries	screening strategy name	{text}	enriched, screened, normalized	sequencing	X	X	X	X	X	C	C	-	C	C	C		1	43	MIXS:0000043
46	target_gene	target gene	Targeted gene or locus name for marker gene studies	gene name	{text}	16S rRNA, 18S rRNA, nif, amoA, rpo	sequencing	-	-	-	-	-	-	M	M	-	-	-		1	44	MIXS:0000044
47	target_subfragment	target subfragment	Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA	gene fragment name	{text}	V6, V9, ITS	sequencing	-	-	-	-	-	-	C	C	-	-	-		1	45	MIXS:0000045
48	pcr_primers	pcr primers	PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters	FWD: forward primer sequence;REV:reverse primer sequence	FWD:{dna};REV:{dna}	FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAAT	sequencing	-	-	-	-	-	-	C	C	-	-	-		1	46	MIXS:0000046
49	mid	multiplex identifiers	Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters	multiplex identifier sequence	{dna}	GTGAATAT	sequencing	-	-	-	-	-	C	C	-	C	C	C		1	47	MIXS:0000047
50	adapters	adapters	Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters	adapter A and B sequence	{dna};{dna}	AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT	sequencing	C	C	C	C	C	C	C	-	C	C	C		1	48	MIXS:0000048
51	pcr_cond	pcr conditions	Description of reaction conditions and components of PCR in the form of 'initial denaturation:94degC_1.5min; annealing=...'	initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles	initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles	initial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35	sequencing	-	-	-	-	-	-	C	C	-	-	-		1	49	MIXS:0000049
52	seq_meth	sequencing method	Sequencing method used; e.g. Sanger, pyrosequencing, ABI-solid	enumeration	[MinION\|GridION\|PromethION\|454 GS\|454 GS 20\|454 GS FLX\|454 GS FLX+\|454 GS FLX Titanium\|454 GS Junior\|Illumina Genome Analyzer\|Illumina Genome Analyzer II\|Illumina Genome Analyzer IIx\|Illumina HiSeq 4000\|Illumina HiSeq 3000\|Illumina HiSeq 2500\|Illumina HiSeq 2000\|Illumina HiSeq 1500\|Illumina HiSeq 1000\|Illumina HiScanSQ\|Illumina MiSeq\|Illumina HiSeq X Five\|Illumina HiSeq X Ten\|Illumina NextSeq 500\|Illumina NextSeq 550\|AB SOLiD System\|AB SOLiD System 2.0\|AB SOLiD System 3.0\|AB SOLiD 3 Plus System\|AB SOLiD 4 System\|AB SOLiD 4hq System\|AB SOLiD PI System\|AB 5500 Genetic Analyzer\|AB 5500xl Genetic Analyzer\|AB 5500xl-W Genetic Analysis System\|Ion Torrent PGM\|Ion Torrent Proton\|Ion Torrent S5\|Ion Torrent S5 XL\|PacBio RS\|PacBio RS II\|Sequel\|AB 3730xL Genetic Analyzer\|AB 3730 Genetic Analyzer\|AB 3500xL Genetic Analyzer\|AB 3500 Genetic Analyzer\|AB 3130xL Genetic Analyzer\|AB 3130 Genetic Analyzer\|AB 310 Genetic Analyzer\|BGISEQ-500]	Illumina HiSeq 1500	sequencing	M	M	M	M	M	M	M	M	M	M	M		1	50	MIXS:0000050
53	seq_quality_check	sequence quality check	Indicate if the sequence has been called by automatic systems (none) or undergone a manual editing procedure (e.g. by inspecting the raw data or chromatograms). Applied only for sequences that are not submitted to SRA,ENA or DRA	none or manually edited	[none\|manually edited]	none	sequencing	-	-	-	-	-	-	C	C	-	-	-		1	51	MIXS:0000051
54	chimera_check	chimera check	A chimeric sequence, or chimera for short, is a sequence comprised of two or more phylogenetically distinct parent sequences. Chimeras are usually PCR artifacts thought to occur when a prematurely terminated amplicon reanneals to a foreign DNA strand and is copied to completion in the following PCR cycles. The point at which the chimeric sequence changes from one parent to the next is called the breakpoint or conversion point	name and version of software, parameters used	{software};{version};{parameters}	uchime;v4.1;default parameters	sequencing	-	-	-	-	-	-	C	C	-	-	-		1	52	MIXS:0000052
55	tax_ident	taxonomic identity marker	The phylogenetic marker(s) used to assign an organism name to the SAG or MAG	enumeration	[16S rRNA gene\|multi-marker approach\|other]	other: rpoB gene	sequencing	C	C	C	C	C	-	-	-	M	M	X		1	53	MIXS:0000053
56	assembly_qual	assembly quality	The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated	enumeration	[Finished genome\|High-quality draft genome\|Medium-quality draft genome\|Low-quality draft genome\|Genome fragment(s)]	High-quality draft genome	sequencing	M	M	X	X	X	C	-	-	M	M	M		1	54	MIXS:0000056
57	assembly_name	assembly name	Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community	name and version of assembly	{text} {text}	HuRef, JCVI_ISG_i3_1.0	sequencing	C	C	C	C	C	C	-	-	C	C	C		1	55	MIXS:0000057
58	assembly_software	assembly software	Tool(s) used for assembly, including version number and parameters	name and version of software, parameters used	{software};{version};{parameters}	metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise	sequencing	M	M	M	M	M	C	C	-	M	M	M		m	56	MIXS:0000058
59	annot	annotation	Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter	name of tool or pipeline used, or annotation source description	{text}	prokka	sequencing	C	C	C	C	C	C	-	-	X	X	X		1	57	MIXS:0000059
60	number_contig	number of contigs	Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG	value	{integer}	40	sequencing	M	M	X	X	X	C	-	-	X	X	M		1	58	MIXS:0000060
61	feat_pred	feature prediction	Method used to predict UViGs features such as ORFs, integration site, etc.	names and versions of software(s), parameters used	{software};{version};{parameters}	Prodigal;2.6.3;default parameters	sequencing	X	X	X	X	X	X	-	-	X	X	C		1	59	MIXS:0000061
62	ref_db	reference database(s)	List of database(s) used for ORF annotation, along with version number and reference to website or publication	names, versions, and references of databases	{database};{version};{reference}	pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975	sequencing	X	X	X	X	X	X	-	-	X	X	C		1	60	MIXS:0000062
63	sim_search_meth	similarity search method	Tool used to compare ORFs with database, along with version and cutoffs used	names and versions of software(s), parameters used	{software};{version};{parameters}	HMMER3;3.1b2;hmmsearch, cutoff of 50 on score	sequencing	X	X	X	X	X	X	-	-	X	X	C		1	61	MIXS:0000063
64	tax_class	taxonomic classification	Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes	classification method, database name, and other parameters	{text}	vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters)	sequencing	X	X	X	X	X	X	-	-	X	X	C		1	62	MIXS:0000064
65	16s_recover	16S recovered	Can a 16S gene be recovered from the submitted SAG or MAG?	boolean	{boolean}	yes	sequencing	-	-	-	-	-	-	-	-	X	X	-		1	63	MIXS:0000065
66	16s_recover_software	16S recovery software	Tools used for 16S rRNA gene extraction	names and versions of software(s), parameters used	{software};{version};{parameters}	rambl;v2;default parameters	sequencing	-	-	-	-	-	-	-	-	X	X	-		1	64	MIXS:0000066
67	trnas	number of standard tRNAs extracted	The total number of tRNAs identified from the SAG or MAG	value from 0-21	{integer}	18	sequencing	-	-	-	-	-	-	-	-	X	X	X		1	65	MIXS:0000067
68	trna_ext_software	tRNA extraction software	Tools used for tRNA identification	names and versions of software(s), parameters used	{software};{version};{parameters}	infernal;v2;default parameters	sequencing	-	-	-	-	-	-	-	-	X	X	X		1	66	MIXS:0000068
69	compl_score	completeness score	Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores	quality;percent completeness	[high\|med\|low];{percentage}	med;60%	sequencing	X	X	X	X	X	-	-	-	M	M	C		1	67	MIXS:0000069
70	compl_software	completeness software	Tools used for completion estimate, i.e. checkm, anvi'o, busco	names and versions of software(s) used	{software};{version}	checkm	sequencing	X	X	X	X	X	-	-	-	M	M	X		1	68	MIXS:0000070
71	compl_appr	completeness approach	The approach used to determine the completeness of a given SAG or MAG, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome	enumeration	[marker gene\|reference based\|other]	other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83)	sequencing	-	-	-	-	-	-	-	-	X	X	C		1	69	MIXS:0000071
72	contam_score	contamination score	The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases	value	{float} percentage	1%	sequencing	-	-	-	-	-	-	-	-	M	M	-		1	70	MIXS:0000072
73	contam_screen_input	contamination screening input	The type of sequence data used as input	enumeration	[reads\| contigs]	contigs	sequencing	-	-	-	-	-	-	-	-	X	X	-		1	71	MIXS:0000005
74	contam_screen_param	contamination screening parameters	Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer	enumeration;value or name	[ref db\|kmer\|coverage\|combination];{text\|integer}	kmer	sequencing	-	-	-	-	-	-	-	-	X	X	-		1	72	MIXS:0000073
75	decontam_software	decontamination software	Tool(s) used in contamination screening	enumeration	[checkm/refinem\|anvi'o\|prodege\|bbtools:decontaminate.sh\|acdc\|combination]	anvi'o	sequencing	-	-	-	-	-	-	-	-	X	X	-		1	73	MIXS:0000074
76	sort_tech	sorting technology	Method used to sort/isolate cells or particles of interest	enumeration	[flow cytometric cell sorting\|microfluidics\|lazer-tweezing\|optical manipulation\|micromanipulation\|other]	optical manipulation	sequencing	-	-	-	-	-	-	-	-	M	-	C		1	74	MIXS:0000075
77	single_cell_lysis_appr	single cell or viral particle lysis approach	Method used to free DNA from interior of the cell(s) or particle(s)	enumeration	[chemical\|enzymatic\|physical\|combination]	enzymatic	sequencing	-	-	-	-	-	-	-	-	M	-	C		1	75	MIXS:0000076
78	single_cell_lysis_prot	single cell or viral particle lysis kit protocol	Name of the kit or standard protocol used for cell(s) or particle(s) lysis	kit, protocol name	{text}	ambion single cell lysis kit	sequencing	-	-	-	-	-	-	-	-	X	-	C		1	76	MIXS:0000054
79	wga_amp_appr	WGA amplification approach	Method used to amplify genomic DNA in preparation for sequencing	enumeration	[pcr based\|mda based]	mda based	sequencing	-	-	-	-	-	-	-	-	M	-	C		1	77	MIXS:0000055
80	wga_amp_kit	WGA amplification kit	Kit used to amplify genomic DNA in preparation for sequencing	kit name	{text}	qiagen repli-g	sequencing	-	-	-	-	-	-	-	-	X	-	C		1	78	MIXS:0000006
81	bin_param	binning parameters	The parameters that have been applied during the extraction of genomes from metagenomic datasets	enumeration	[homology search\|kmer\|coverage\|codon usage\|combination]	coverage and kmer	sequencing	-	-	-	-	-	-	-	-	-	M	C		1	79	MIXS:0000077
82	bin_software	binning software	Tool(s) used for the extraction of genomes from metagenomic datasets	enumeration	[metabat\|maxbin\|concoct\|groupm\|esom\|metawatt\|combination\|other]	concoct and maxbin	sequencing	-	-	-	-	-	-	-	-	-	M	C		1	80	MIXS:0000078
83	reassembly_bin	reassembly post binning	Has an assembly been performed on a genome bin extracted from a metagenomic assembly?	boolean	{boolean}	no	sequencing	-	-	-	-	-	-	-	-	-	X	C		1	81	MIXS:0000079
84	mag_cov_software	MAG coverage software	Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets	enumeration	[bwa\|bbmap\|bowtie\|other]	bbmap	sequencing	-	-	-	-	-	-	-	-	-	X	X		1	82	MIXS:0000080
85	vir_ident_software	viral identification software	Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used	software name, version and relevant parameters	{software};{version};{parameters}	VirSorter; 1.0.4; Virome database, category 2	sequencing	-	-	-	-	-	-	-	-	-	-	M		1	83	MIXS:0000081
86	pred_genome_type	predicted genome type	Type of genome predicted for the UViG	enumeration	[DNA\|dsDNA\|ssDNA\|RNA\|dsRNA\|ssRNA\|ssRNA (+)\|ssRNA (-)\|mixed\|uncharacterized]	dsDNA	sequencing	-	-	-	-	-	-	-	-	-	-	M		1	84	MIXS:0000082
87	pred_genome_struc	predicted genome structure	Expected structure of the viral genome	enumeration	[segmented\|non-segmented\|undetermined]	non-segmented	sequencing	-	-	-	-	-	-	-	-	-	-	M		1	85	MIXS:0000083
88	detec_type	detection type	Type of UViG detection	enumeration	[independent sequence (UViG)\|provirus (UpViG)]	independent sequence (UViG)	sequencing	-	-	-	-	-	-	-	-	-	-	M		1	86	MIXS:0000084
89	votu_class_appr	vOTU classification approach	Cutoffs and approach used when clustering new UViGs in “species-level” vOTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside vOTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis	cutoffs and method used	{ANI cutoff};{AF cutoff};{clustering method}	95% ANI;85% AF; greedy incremental clustering	sequencing	-	-	-	-	-	-	-	-	-	-	C		1	87	MIXS:0000085
90	votu_seq_comp_appr	vOTU sequence comparison approach	Tool and thresholds used to compare sequences when computing "species-level" vOTUs	software name, version and relevant parameters	{software};{version};{parameters}	blastn;2.6.0+;e-value cutoff: 0.001	sequencing	-	-	-	-	-	-	-	-	-	-	C		1	88	MIXS:0000086
91	votu_db	vOTU database	Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in "species-level" vOTUs, if any	database and version	{database};{version}	NCBI Viral RefSeq;83	sequencing	-	-	-	-	-	-	-	-	-	-	C		1	89	MIXS:0000087
92	host_pred_appr	host prediction approach	Tool or approach used for host prediction	enumeration	[provirus\|host sequence similarity\|CRISPR spacer match\|kmer similarity\|co-occurrence\|combination\|other]	CRISPR spacer match	sequencing	-	-	-	-	-	-	-	-	-	-	C		1	90	MIXS:0000088
93	host_pred_est_acc	host prediction estimated accuracy	For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature	false discovery rate	{text}	CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048)	sequencing	-	-	-	-	-	-	-	-	-	-	C		1	91	MIXS:0000089
94	url	relevant electronic resources		URL	{URL}	http://www.earthmicrobiome.org/	sequencing	C	C	C	C	C	C	C	C	C	C	C		m	92	MIXS:0000091
95	sop	relevant standard operating procedures	Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences	reference to SOP	{PMID}\|{DOI}\|{URL}	http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/	sequencing	C	C	C	C	C	C	C	C	C	C	C		m	93	MIXS:0000090
96
97
98
99
100