| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Structured comment name | Item (rdfs:label) | Definition | Expected value | Value syntax | Example | Section | MINAS Microbiome | MINAS Pathogen | MINAS SedaDNA | MINAS FINAL Consensus | Preferred unit | Occurence | MIXS ID | MINAS Microbiome Comment | MINAS Pathogen Comment | MINAS sedaDNA Comment | ||||||||||||
2 | samp_name | sample name | A local identifier or name that for the material sample used for extracting nucleic acids, and subsequent sequencing. It can refer either to the original material collected or to any derived sub-samples. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. INSDC requires every sample name from a single Submitter to be unique. Use of a globally unique identifier for the field source_mat_id is recommended in addition to sample_name. | text | {text} | ISDsoil1 | investigation | M | M | M | M | 1 | MIXS:0001107 | ||||||||||||||||
3 | samp_taxon_id | Taxonomy ID of DNA sample | NCBI taxon id of the sample. Maybe be a single taxon or mixed taxa sample. Use 'synthetic metagenome’ for mock community/positive controls, or 'blank sample' for negative controls. | Taxonomy ID | {text} [NCBI:txid] | Gut Metagenome [NCBI:txid749906] | investigation | M | C/X | M | CONFLICT | 1 | MIXS:0001320 | Problem: SG and capture data contain many microbes, not just studied microbe | |||||||||||||||
4 | project_name | project name | Name of the project within which the sequencing was organized | {text} | Forest soil metagenome | investigation | M | M | M | M | 1 | MIXS:0000092 | Guidance: Name of the first project the sample is associated to | Guidance: Name of the first project the sample is associated to | |||||||||||||||
5 | lat_lon | geographic location (latitude and longitude) | The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system | decimal degrees, limit to 8 decimal points | {float} {float} | 50.586825 6.408977 | environment | C | C | C | C | 1 | MIXS:0000009 | Condition: at least one of lat long or geo_loc_name | Condition: excavation/recovery coordinates known | Condition: at least one of lat long or geo_loc_name | |||||||||||||
6 | depth | depth | The vertical distance below local surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectively. Depth can be reported as an interval for subsurface samples. | measurement value | {float} {unit} | 10 meter | environment | - | C | - | CONFLICT | Condition: excavation/recovery depth known | Condition: excavation/recovery depth known | ||||||||||||||||
7 | elev | elevation | Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit. | measurement value | {float} {unit} | 100 meter | environment | X | C | X | CONFLICT | 1 | MIXS:0000093 | Condition: for mummy samples or rock shelters (can be inferred from map/topology) | Condition: excavation/recovery coordinates known | Condition: for mummy samples or rock shelters (can be inferred from map/topology) | |||||||||||||
8 | temp | temperature | Temperature of the sample at the time of sampling. | measurement value | {float} {unit} | 25 degree Celsius | environment | - | C | - | CONFLICT | Modified meaning: storage temperature and duration; Condition: storage temperature and duration known | |||||||||||||||||
9 | geo_loc_name | geographic location (country and/or sea,region) | The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (http://purl.bioontology.org/ontology/GAZ) | country or sea name (INSDC or GAZ): region(GAZ), specific location name | {term}: {term}, {text} | USA: Maryland, Bethesda | environment | C | M | C | CONFLICT | 1 | MIXS:0000010 | Condition: either lat long or geo_loc_name | Condition: excavation/recovery depth known | Condition: either lat long or geo_loc_name | |||||||||||||
10 | collection_date | collection date | The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant | date and time | {timestamp} | 2018-05-11T10:00:00+01:00; 2018-05-11 | environment | C | C | C | C | 1 | MIXS:0000011 | Problem: see comments regarding the different dates 'entry'. However we feel can be further defined as - Guidance: removal from main body of sample, later used for extraction; drilling; removal of calculus from tooth; or if not direct sampling from collection, date of removal from collection to take to lab) | Problem: need excavation and collection date; timestamp format not applicable to many excavations | Problem: see comments regarding the different dates 'entry'. However we feel can be further defined as - Guidance: removal from main body of sample, later used for extraction; drilling; removal of calculus from tooth; or if not direct sampling from collection, date of removal from collection to take to lab) | |||||||||||||
11 | neg_cont_type | negative control type | The substance or equipment used as a negative control in an investigation | enumeration or text | [distilled water|phosphate buffer|empty collection device|empty collection tube|DNA-free PCR mix|sterile swab |sterile syringe] | investigation | M | C | M | CONFLICT | 1 | MIXS:0001321 | Condition: use of negative control | ||||||||||||||||
12 | pos_cont_type | positive control type | The substance, mixture, product, or apparatus used to verify that a process which is part of an investigation delivers a true positive. | {term} or {text} | investigation | - | C/X | - | CONFLICT | Condition: use of positive control | |||||||||||||||||||
13 | env_broad_scale | broad-scale environmental context | Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS | The major environment type(s) where the sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes. | {termLabel} {[termID]} | oceanic epipelagic zone biome [ENVO:01000033] for annotating a water sample from the photic zone in middle of the Atlantic Ocean | environment | X | C/X | X | X | 1 | MIXS:0000012 | Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc. | Condition: coordinates known; Problem: landscapes change through time | Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc. | |||||||||||||
14 | env_local_scale | local environmental context | Report the entity or entities which are in the sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS. | Environmental entities having causal influences upon the entity at time of sampling. | {termLabel} {[termID]} | litter layer [ENVO:01000338]; Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]|herb and fern layer [ENVO:01000337]|litter layer [ENVO:01000338]|understory [01000335]|shrub layer [ENVO:01000336]. | environment | X | C/X | X | X | 1 | MIXS:0000013 | Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc. | Guidance: anatomical site sampled | Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc. | |||||||||||||
15 | env_medium | environmental medium | Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS . Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top). | The material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. | {termLabel} {[termID]} | soil [ENVO:00001998]; Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]|air [ENVO_00002005] | environment | X | C/X | X | X | 1 | MIXS:0000014 | Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc. | Modified meaning: burial environment (exp. directly interred, stone tomb, cave) | Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc. | |||||||||||||
16 | subspecf_gen_lin | subspecific genetic lineage | Information about the genetic distinctness of the sequenced organism below the subspecies level, e.g., serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123. | Genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype, variety, cultivar. | {rank name}:{text} | serovar:Newport | nucleic acid sequence source | - | C | - | CONTEXT SPECIFIC | Problem: SG and capture data contain many microbes, not just studied microbe | |||||||||||||||||
17 | ploidy | ploidy | The ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATO | PATO | {termLabel} {[termID]} | allopolyploidy [PATO:0001379] | nucleic acid sequence source | - | C/X | - | CONTEXT SPECIFIC | Problem: SG and capture data contain many microbes, not just studied microbe | |||||||||||||||||
18 | num_replicons | number of replicons | Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote | for eukaryotes and bacteria: chromosomes (haploid count); for viruses: segments | {integer} | 2 | nucleic acid sequence source | - | X | - | CONTEXT SPECIFIC | Problem: SG and capture data contain many microbes, not just studied microbe | |||||||||||||||||
19 | extrachrom_elements | extrachromosomal elements | Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids) | number of extrachromosmal elements | {integer} | 5 | nucleic acid sequence source | - | X | - | CONTEXT SPECIFIC | Problem: SG and capture data contain many microbes, not just studied microbe | |||||||||||||||||
20 | ref_biomaterial | reference for biomaterial | Primary publication if isolated before genome publication; otherwise, primary genome report. | PMID, DOI or URL | {PMID}|{DOI}|{URL} | doi:10.1016/j.syapm.2018.01.009 | nucleic acid sequence source | - | C | - | CONFLICT | Condition: sample has been sequenced before | |||||||||||||||||
21 | source_mat_id | source material identifiers | A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2). | for cultures of microorganisms: identifiers for two culture collections; for other material a unique arbitrary identifer | {text} | MPI012345 | nucleic acid sequence source | M | M | M | M | m | MIXS:0000026 | Guidance: ID should refer to the institute of origin of the sample (archaeological or Museum code) NOT the project lab code (exceptions: blanks) | Guidance: archaeological or museum ID | Guidance: ID should refer to the institute of origin of the sample (archaeological or Museum code) NOT the project lab code (exceptions: blanks) | |||||||||||||
22 | specific_host | host scientific name | Report the host's taxonomic name and/or NCBI taxonomy ID. | host scientific name, taxonomy ID | {text}|{NCBI taxid} | Homo sapiens and/or 9606 | nucleic acid sequence source | M | M | M | M | 1 | MIXS:0000029 | ||||||||||||||||
23 | host_disease_stat | host disease status | List of diseases with which the host has been diagnosed; can include multiple diagnoses. The value of the field depends on host; for humans the terms should be chosen from the DO (Human Disease Ontology) at https://www.disease-ontology.org, non-human host diseases are free text | disease name or Disease Ontology term | {termLabel} {[termID]}|{text} | rabies [DOID:11260] | nucleic acid sequence source | X | X | X | X | m | MIXS:0000031 | Condition: Should be given if observed, but this depends on type and completeness of remains (yes calcus, but if more than just single tooth; can't be done for palaeofaeces) | Problem: decide if this should refer to skeleton pathology or all pathogen hits in the data | Condition: Should be given if observed, but this depends on type and completeness of remains (yes calcus, but if more than just single tooth; can't be done for palaeofaeces) | |||||||||||||
24 | samp_collec_method | sample collection method | The method employed for collecting the sample. | PMID,DOI,url , or text | {PMID}|{DOI}|{URL}|{text} | swabbing | nucleic acid sequence source | M | M | M | M | 1 | MIXS:0001225 | Guidance: DOI or description of sampling method; be mindful of replicate protocol fields | |||||||||||||||
25 | samp_mat_process | sample material processing | A brief description of any processing applied to the sample during or after retrieving the sample from environment, or a link to the relevant protocol(s) performed. | text | {text} | filtering of seawater, storing samples in ethanol | nucleic acid sequence source | X | - | X | CONFLICT | 1 | MIXS:0000016 | Problem: See comments re storage conditions and treatment: we may propose to expand these into multiple fields | Problem: SG and capture data contain many microbes, not just studied microbe | Problem: See comments re storage conditions and treatment: we may propose to expand these into multiple fields | |||||||||||||
26 | size_frac | size fraction selected | Filtering pore size used in sample preparation | filter size value range | {float}-{float} {unit} | 0-0.22 micrometer | nucleic acid sequence source | X | - | X | CONFLICT | 1 | MIXS:0000017 | Problem: SG and capture data contain many microbes, not just studied microbe | |||||||||||||||
27 | samp_size | amount or size of sample collected | The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected. | measurement value | {float} {unit} | 5 liter | nucleic acid sequence source | M | - | M | CONFLICT | millliter, gram, milligram, liter | 1 | MIXS:0000001 | Problem: Require a value for 'no detection' - calculus can be unweighable | Condition: sample has been sequenced before | Problem: Require a value for 'no detection' - calculus can be unweighable | ||||||||||||
28 | samp_vol_we_dna_ext | sample volume or weight for DNA extraction | Volume (ml) or mass (g) of total collected sample processed for DNA extraction. Note: total sample collected should be entered under the term Sample Size (MIXS:0000001). | measurement value | {float} {unit} | 1500 milliliter | nucleic acid sequence source | M | M | M | M | millliter, gram, milligram, square centimeter | 1 | MIXS:0000111 | Problem: Require a value for 'no detection' - calculus can be unweighable | Guidance: mass of tissue used for extraction | Problem: Require a value for 'no detection' - calculus can be unweighable | ||||||||||||
29 | virus_enrich_appr | virus enrichment approach | List of approaches used to enrich the sample for viruses, if any | enumeration | [filtration|ultrafiltration|centrifugation|ultracentrifugation|PEG Precipitation|FeCl Precipitation|CsCl density gradient|DNAse|RNAse|targeted sequence capture|other|none] | filtration + FeCl Precipitation + ultracentrifugation + DNAse | nucleic acid sequence source | - | ? | - | CONFLICT | Modified meaning: any enrichment approached used on dataset, not just for viruses | |||||||||||||||||
30 | nucl_acid_ext | nucleic acid extraction | A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample | PMID, DOI or URL | {PMID}|{DOI}|{URL} | https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf | sequencing | M | M | M | M | 1 | MIXS:0000037 | ||||||||||||||||
31 | nucl_acid_amp | nucleic acid amplification | A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids | PMID, DOI or URL | {PMID}|{DOI}|{URL} | https://phylogenomics.me/protocols/16s-pcr-protocol/ | sequencing | M | C | M | CONFLICT | 1 | MIXS:0000038 | Condition: PCR-based study | |||||||||||||||
32 | lib_reads_seqd | library reads sequenced | Total number of clones sequenced from the library | number of reads sequenced | {integer} | 20 | sequencing | - | M | - | CONFLICT | Guidance: number of reads sequenced, not clones | |||||||||||||||||
33 | lib_layout | library layout | Specify whether to expect single, paired, or other configuration of reads | enumeration | [paired|single|vector|other] | paired | sequencing | M | M | M | M | 1 | MIXS:0000041 | ||||||||||||||||
34 | lib_screen | library screening strategy | Specific enrichment or screening methods applied before and/or after creating libraries | screening strategy name | {text} | enriched, screened, normalized | sequencing | C | M | C | CONFLICT | 1 | MIXS:0000043 | Question: there is already a field for this in general upload. Do we need duplicate entries? | |||||||||||||||
35 | target_gene | target gene | Targeted gene or locus name for marker gene studies | gene name | {text} | 16S rRNA, 18S rRNA, nif, amoA, rpo | sequencing | - | C | - | CONTEXT SPECIFIC | Condition: PCR-based study | |||||||||||||||||
36 | target_subfragment | target subfragment | Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA | gene fragment name | {text} | V6, V9, ITS | sequencing | - | C | - | CONTEXT SPECIFIC | Condition: PCR-based study | |||||||||||||||||
37 | pcr_primers | pcr primers | PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters | FWD: forward primer sequence;REV:reverse primer sequence | FWD:{dna};REV:{dna} | FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAAT | sequencing | - | C | - | CONTEXT SPECIFIC | Condition: PCR-based study | |||||||||||||||||
38 | mid | multiplex identifiers | Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters | multiplex identifier sequence | {dna} | GTGAATAT | sequencing | M | X | M | CONFLICT | 1 | MIXS:0000047 | ||||||||||||||||
39 | adapters | adapters | Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters | adapter A and B sequence | {dna};{dna} | AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT | sequencing | M | X | M | CONFLICT | 1 | MIXS:0000048 | Problem: adapters meant to be removed already if uploading to ENA; but can we trust that? | Problem: adapters meant to be removed already if uploading to ENA; but can we trust that? | ||||||||||||||
40 | pcr_cond | pcr conditions | Description of reaction conditions and components of PCR in the form of 'initial denaturation:94degC_1.5min; annealing=...' | initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles | initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles | initial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35 | sequencing | - | C | - | CONFLICT | Condition: PCR-based study | |||||||||||||||||
41 | seq_meth | sequencing method | Sequencing machine used. Where possible the term should be taken from the OBI list of DNA sequencers (http://purl.obolibrary.org/obo/OBI_0400103). | Text or OBI | {termLabel} {[termID]}|{text} | 454 Genome Sequencer FLX [OBI:0000702] | sequencing | M | M | M | M | 1 | MIXS:0000050 | ||||||||||||||||
42 | chimera_check | chimera check software | Tool(s) used for chimera checking, including version number and parameters, to discover and remove chimeric sequences. A chimeric sequence is comprised of two or more phylogenetically distinct parent sequences. | name and version of software, parameters used | {software};{version};{parameters} | uchime;v4.1;default parameters | sequencing | - | C | - | CONTEXT SPECIFIC | Condition: performed chimera assessment | |||||||||||||||||
43 | tax_ident | taxonomic identity marker | The phylogenetic marker(s) used to assign an organism name to the SAG or MAG | enumeration | [16S rRNA gene|multi-marker approach|other] | other: rpoB gene | sequencing | C | X | C | CONTEXT SPECIFIC | 1 | MIXS:0000053 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
44 | assembly_qual | assembly quality | The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated | enumeration | [Finished genome|High-quality draft genome|Medium-quality draft genome|Low-quality draft genome|Genome fragment(s)] | High-quality draft genome | sequencing | C | C | C | CONTEXT SPECIFIC | 1 | MIXS:0000056 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: assembly data | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
45 | assembly_name | assembly name | Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community | name and version of assembly | {text} {text} | HuRef, JCVI_ISG_i3_1.0 | sequencing | C | C | C | CONTEXT SPECIFIC | 1 | MIXS:0000057 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: assembly data | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
46 | assembly_software | assembly software | Tool(s) used for assembly, including version number and parameters | name and version of software, parameters used | {software};{version};{parameters} | metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise | sequencing | C | C | C | CONTEXT SPECIFIC | 1 | MIXS:0000058 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: assembly data | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
47 | annot | annotation | Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter | name of tool or pipeline used, or annotation source description | {text} | prokka | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000059 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: annotated assembly | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
48 | number_contig | number of contigs | Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG | value | {integer} | 40 | sequencing | C | C | C | CONTEXT SPECIFIC | 1 | MIXS:0000060 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: assembly data | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
49 | feat_pred | feature prediction | Method used to predict UViGs features such as ORFs, integration site, etc. | names and versions of software(s), parameters used | {software};{version};{parameters} | Prodigal;2.6.3;default parameters | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000061 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: assembly data | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
50 | ref_db | reference database(s) | List of database(s) used for ORF annotation, along with version number and reference to website or publication | names, versions, and references of databases | {database};{version};{reference} | pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975 | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000062 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: assembly data | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
51 | sim_search_meth | similarity search method | Tool used to compare ORFs with database, along with version and cutoffs used | names and versions of software(s), parameters used | {software};{version};{parameters} | HMMER3;3.1b2;hmmsearch, cutoff of 50 on score | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000063 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: assembly data | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
52 | tax_class | taxonomic classification | Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes | classification method, database name, and other parameters | {text} | vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters) | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000064 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of taxonomic classification are not observable in raw data upload; is this a necessary field? | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
53 | 16s_recover | 16S recovered | Can a 16S gene be recovered from the submitted SAG or MAG? | boolean | {boolean} | yes | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000065 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: 16S rRNA data | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
54 | 16s_recover_software | 16S recovery software | Tools used for 16S rRNA gene extraction | names and versions of software(s), parameters used | {software};{version};{parameters} | rambl;v2;default parameters | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000066 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: 16S rRNA data; Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
55 | trnas | number of standard tRNAs extracted | The total number of tRNAs identified from the SAG or MAG | value from 0-21 | {integer} | 18 | sequencing | C | - | C | CONTEXT SPECIFIC | 1 | MIXS:0000067 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
56 | trna_ext_software | tRNA extraction software | Tools used for tRNA identification | names and versions of software(s), parameters used | {software};{version};{parameters} | infernal;v2;default parameters | sequencing | C | - | C | CONTEXT SPECIFIC | 1 | MIXS:0000068 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
57 | compl_score | completeness score | Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores | quality;percent completeness | [high|med|low];{percentage} | med;60% | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000069 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
58 | compl_software | completeness software | Tools used for completion estimate, i.e. checkm, anvi'o, busco | names and versions of software(s) used | {software};{version} | checkm | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000070 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
59 | compl_appr | completeness approach | The approach used to determine the completeness of a given genomic assembly, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome | text | [marker gene|reference based|other] | other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83) | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000071 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
60 | contam_score | contamination score | The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases | value | {float} percentage | 1.00% | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000072 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
61 | contam_screen_input | contamination screening input | The type of sequence data used as input | enumeration | [reads| contigs] | contigs | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000005 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
62 | contam_screen_param | contamination screening parameters | Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer | enumeration;value or name | [ref db|kmer|coverage|combination];{text|integer} | kmer | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000073 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
63 | decontam_software | decontamination software | Tool(s) used in contamination screening | enumeration | [checkm/refinem|anvi'o|prodege|bbtools:decontaminate.sh|acdc|combination] | anvi'o | sequencing | C | X/C | C | CONTEXT SPECIFIC | 1 | MIXS:0000074 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
64 | bin_param | binning parameters | The parameters that have been applied during the extraction of genomes from metagenomic datasets | enumeration | [homology search|kmer|coverage|codon usage|combination] | coverage and kmer | sequencing | C | C | C | CONTEXT SPECIFIC | 1 | MIXS:0000077 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
65 | bin_software | binning software | Tool(s) used for the extraction of genomes from metagenomic datasets, where possible include a product ID (PID) of the tool(s) used. | names and versions of software(s) used | {software};{version}{PID} | MetaCluster-TA (RRID:SCR_004599), MaxBin (biotools:maxbin) | sequencing | C | C | C | CONTEXT SPECIFIC | 1 | MIXS:0000078 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
66 | reassembly_bin | reassembly post binning | Has an assembly been performed on a genome bin extracted from a metagenomic assembly? | boolean | {boolean} | no | sequencing | C | C | C | CONTEXT SPECIFIC | 1 | MIXS:0000079 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
67 | mag_cov_software | MAG coverage software | Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets | enumeration | [bwa|bbmap|bowtie|other] | bbmap | sequencing | C | C | C | CONTEXT SPECIFIC | 1 | MIXS:0000080 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
68 | vir_ident_software | viral identification software | Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used | software name, version and relevant parameters | {software};{version};{parameters} | VirSorter; 1.0.4; Virome database, category 2 | sequencing | C | - | C | CONTEXT SPECIFIC | 1 | MIXS:0000081 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
69 | pred_genome_type | predicted genome type | Type of genome predicted for the UViG | enumeration | [DNA|dsDNA|ssDNA|RNA|dsRNA|ssRNA|ssRNA (+)|ssRNA (-)|mixed|uncharacterized] | dsDNA | sequencing | C | - | C | CONTEXT SPECIFIC | 1 | MIXS:0000082 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
70 | pred_genome_struc | predicted genome structure | Expected structure of the viral genome | enumeration | [segmented|non-segmented|undetermined] | non-segmented | sequencing | C | - | C | CONTEXT SPECIFIC | 1 | MIXS:0000083 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
71 | detec_type | detection type | Type of UViG detection | enumeration | [independent sequence (UViG)|provirus (UpViG)] | independent sequence (UViG) | sequencing | C | - | C | CONTEXT SPECIFIC | 1 | MIXS:0000084 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
72 | otu_class_appr | OTU classification approach | Cutoffs and approach used when clustering “species-level” OTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside OTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis | cutoffs and method used | {ANI cutoff};{AF cutoff};{clustering method} | 95% ANI;85% AF; greedy incremental clustering | sequencing | C | X | C | CONTEXT SPECIFIC | 1 | MIXS:0000085 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
73 | otu_seq_comp_appr | OTU sequence comparison approach | Tool and thresholds used to compare sequences when computing "species-level" OTUs | software name, version and relevant parameters | {software};{version};{parameters} | blastn;2.6.0+;e-value cutoff: 0.001 | sequencing | C | X | C | CONTEXT SPECIFIC | 1 | MIXS:0000086 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
74 | otu_db | OTU database | Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in "species-level" OTUs, if any | database and version | {database};{version} | NCBI Viral RefSeq;83 | sequencing | C | X | C | CONTEXT SPECIFIC | 1 | MIXS:0000087 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Problem: results of analysis are not observable in raw data upload | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | |||||||||||||
75 | host_pred_appr | host prediction approach | Tool or approach used for host prediction | enumeration | [provirus|host sequence similarity|CRISPR spacer match|kmer similarity|co-occurrence|combination|other] | CRISPR spacer match | sequencing | C | - | C | CONTEXT SPECIFIC | 1 | MIXS:0000088 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
76 | host_pred_est_acc | host prediction estimated accuracy | For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature | false discovery rate | {text} | CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048) | sequencing | C | - | C | CONTEXT SPECIFIC | 1 | MIXS:0000089 | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads! | ||||||||||||||
77 | associated resource | relevant electronic resources | A related resource that is referenced, cited, or otherwise associated to the sequence. | reference to resource | {PMID} | {DOI} | {URL} | http://www.earthmicrobiome.org/ | sequencing | C | X | C | CONFLICT | m | MIXS:0000091 | ||||||||||||||||
78 | sop | relevant standard operating procedures | Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences | reference to SOP | {PMID}|{DOI}|{URL} | http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/ | sequencing | C | C | C | C | m | MIXS:0000090 | Condition: assembly data; Problem: results of analysis are not observable in raw data upload | |||||||||||||||
79 | |||||||||||||||||||||||||||||
80 | |||||||||||||||||||||||||||||
81 | |||||||||||||||||||||||||||||
82 | |||||||||||||||||||||||||||||
83 | |||||||||||||||||||||||||||||
84 | |||||||||||||||||||||||||||||
85 | |||||||||||||||||||||||||||||
86 | |||||||||||||||||||||||||||||
87 | |||||||||||||||||||||||||||||
88 | |||||||||||||||||||||||||||||
89 | |||||||||||||||||||||||||||||
90 | |||||||||||||||||||||||||||||
91 | |||||||||||||||||||||||||||||
92 | |||||||||||||||||||||||||||||
93 | |||||||||||||||||||||||||||||
94 | |||||||||||||||||||||||||||||
95 | |||||||||||||||||||||||||||||
96 | |||||||||||||||||||||||||||||
97 | |||||||||||||||||||||||||||||
98 | |||||||||||||||||||||||||||||
99 | |||||||||||||||||||||||||||||
100 |