ABCDEFGHIJKLMNOPQRSTUVWXYZAAABAC
1
Structured comment nameItem (rdfs:label)DefinitionExpected valueValue syntaxExampleSectionMINAS MicrobiomeMINAS PathogenMINAS SedaDNAMINAS FINAL ConsensusPreferred unitOccurenceMIXS IDMINAS Microbiome CommentMINAS Pathogen CommentMINAS sedaDNA Comment
2
samp_namesample nameA local identifier or name that for the material sample used for extracting nucleic acids, and subsequent sequencing. It can refer either to the original material collected or to any derived sub-samples. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. INSDC requires every sample name from a single Submitter to be unique. Use of a globally unique identifier for the field source_mat_id is recommended in addition to sample_name.text{text}ISDsoil1investigationMMMM1MIXS:0001107
3
samp_taxon_idTaxonomy ID of DNA sampleNCBI taxon id of the sample. Maybe be a single taxon or mixed taxa sample. Use 'synthetic metagenome’ for mock community/positive controls, or 'blank sample' for negative controls.Taxonomy ID{text} [NCBI:txid]Gut Metagenome [NCBI:txid749906]investigationMC/XMCONFLICT1MIXS:0001320Problem: SG and capture data contain many microbes, not just studied microbe
4
project_nameproject nameName of the project within which the sequencing was organized{text}Forest soil metagenomeinvestigationMMMM1MIXS:0000092Guidance: Name of the first project the sample is associated toGuidance: Name of the first project the sample is associated to
5
lat_longeographic location (latitude and longitude)The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 systemdecimal degrees, limit to 8 decimal points{float} {float}50.586825 6.408977environmentCCCC1MIXS:0000009Condition: at least one of lat long or geo_loc_name Condition: excavation/recovery coordinates knownCondition: at least one of lat long or geo_loc_name
6
depthdepthThe vertical distance below local surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectively. Depth can be reported as an interval for subsurface samples.measurement value{float} {unit}10 meterenvironment-C-CONFLICTCondition: excavation/recovery depth knownCondition: excavation/recovery depth known
7
elevelevationElevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit.measurement value{float} {unit}100 meterenvironmentXCXCONFLICT1MIXS:0000093Condition: for mummy samples or rock shelters (can be inferred from map/topology)Condition: excavation/recovery coordinates knownCondition: for mummy samples or rock shelters (can be inferred from map/topology)
8
temptemperatureTemperature of the sample at the time of sampling.measurement value{float} {unit}25 degree Celsiusenvironment-C-CONFLICTModified meaning: storage temperature and duration; Condition: storage temperature and duration known
9
geo_loc_namegeographic location (country and/or sea,region)The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (http://purl.bioontology.org/ontology/GAZ)country or sea name (INSDC or GAZ): region(GAZ), specific location name{term}: {term}, {text}USA: Maryland, BethesdaenvironmentCMCCONFLICT1MIXS:0000010Condition: either lat long or geo_loc_nameCondition: excavation/recovery depth knownCondition: either lat long or geo_loc_name
10
collection_datecollection dateThe time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliantdate and time{timestamp}2018-05-11T10:00:00+01:00; 2018-05-11environmentCCCC1MIXS:0000011Problem: see comments regarding the different dates 'entry'. However we feel can be further defined as - Guidance: removal from main body of sample, later used for extraction; drilling; removal of calculus from tooth; or if not direct sampling from collection, date of removal from collection to take to lab)Problem: need excavation and collection date; timestamp format not applicable to many excavationsProblem: see comments regarding the different dates 'entry'. However we feel can be further defined as - Guidance: removal from main body of sample, later used for extraction; drilling; removal of calculus from tooth; or if not direct sampling from collection, date of removal from collection to take to lab)
11
neg_cont_typenegative control typeThe substance or equipment used as a negative control in an investigationenumeration or text[distilled water|phosphate buffer|empty collection device|empty collection tube|DNA-free PCR mix|sterile swab |sterile syringe]investigationMCMCONFLICT1MIXS:0001321Condition: use of negative control
12
pos_cont_typepositive control typeThe substance, mixture, product, or apparatus used to verify that a process which is part of an investigation delivers a true positive.{term} or {text}investigation-C/X-CONFLICTCondition: use of positive control
13
env_broad_scalebroad-scale environmental contextReport the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxSThe major environment type(s) where the sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes.{termLabel} {[termID]}oceanic epipelagic zone biome [ENVO:01000033] for annotating a water sample from the photic zone in middle of the Atlantic OceanenvironmentXC/XXX1MIXS:0000012Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.Condition: coordinates known; Problem: landscapes change through timeProblem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.
14
env_local_scalelocal environmental contextReport the entity or entities which are in the sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS.Environmental entities having causal influences upon the entity at time of sampling.{termLabel} {[termID]}litter layer [ENVO:01000338]; Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]|herb and fern layer [ENVO:01000337]|litter layer [ENVO:01000338]|understory [01000335]|shrub layer [ENVO:01000336].environmentXC/XXX1MIXS:0000013Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.Guidance: anatomical site sampledProblem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.
15
env_mediumenvironmental mediumReport the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS . Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top).The material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483].{termLabel} {[termID]}soil [ENVO:00001998]; Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]|air [ENVO_00002005]environmentXC/XXX1MIXS:0000014Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.Modified meaning: burial environment (exp. directly interred, stone tomb, cave)Problem: See comments - to revisit based on definitions and ontologies; need more guidance to what level refers to which; ENVO missing stuff like burials, cemetaries etc.
16
subspecf_gen_linsubspecific genetic lineageInformation about the genetic distinctness of the sequenced organism below the subspecies level, e.g., serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123.Genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype, variety, cultivar.{rank name}:{text}serovar:Newportnucleic acid sequence source-C-CONTEXT SPECIFICProblem: SG and capture data contain many microbes, not just studied microbe
17
ploidyploidyThe ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATOPATO{termLabel} {[termID]}allopolyploidy [PATO:0001379]nucleic acid sequence source-C/X-CONTEXT SPECIFICProblem: SG and capture data contain many microbes, not just studied microbe
18
num_repliconsnumber of repliconsReports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryotefor eukaryotes and bacteria: chromosomes (haploid count); for viruses: segments{integer}2nucleic acid sequence source-X-CONTEXT SPECIFICProblem: SG and capture data contain many microbes, not just studied microbe
19
extrachrom_elementsextrachromosomal elementsDo plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids)number of extrachromosmal elements{integer}5nucleic acid sequence source-X-CONTEXT SPECIFICProblem: SG and capture data contain many microbes, not just studied microbe
20
ref_biomaterialreference for biomaterialPrimary publication if isolated before genome publication; otherwise, primary genome report.PMID, DOI or URL{PMID}|{DOI}|{URL}doi:10.1016/j.syapm.2018.01.009nucleic acid sequence source-C-CONFLICTCondition: sample has been sequenced before
21
source_mat_idsource material identifiersA unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2).for cultures of microorganisms: identifiers for two culture collections; for other material a unique arbitrary identifer{text}MPI012345nucleic acid sequence sourceMMMMmMIXS:0000026Guidance: ID should refer to the institute of origin of the sample (archaeological or Museum code) NOT the project lab code (exceptions: blanks)Guidance: archaeological or museum IDGuidance: ID should refer to the institute of origin of the sample (archaeological or Museum code) NOT the project lab code (exceptions: blanks)
22
specific_hosthost scientific nameReport the host's taxonomic name and/or NCBI taxonomy ID.host scientific name, taxonomy ID{text}|{NCBI taxid}Homo sapiens and/or 9606nucleic acid sequence sourceMMMM1MIXS:0000029
23
host_disease_stathost disease statusList of diseases with which the host has been diagnosed; can include multiple diagnoses. The value of the field depends on host; for humans the terms should be chosen from the DO (Human Disease Ontology) at https://www.disease-ontology.org, non-human host diseases are free textdisease name or Disease Ontology term{termLabel} {[termID]}|{text}rabies [DOID:11260]nucleic acid sequence sourceXXXXmMIXS:0000031Condition: Should be given if observed, but this depends on type and completeness of remains (yes calcus, but if more than just single tooth; can't be done for palaeofaeces)Problem: decide if this should refer to skeleton pathology or all pathogen hits in the dataCondition: Should be given if observed, but this depends on type and completeness of remains (yes calcus, but if more than just single tooth; can't be done for palaeofaeces)
24
samp_collec_methodsample collection methodThe method employed for collecting the sample.PMID,DOI,url , or text{PMID}|{DOI}|{URL}|{text}swabbingnucleic acid sequence sourceMMMM1MIXS:0001225Guidance: DOI or description of sampling method; be mindful of replicate protocol fields
25
samp_mat_processsample material processingA brief description of any processing applied to the sample during or after retrieving the sample from environment, or a link to the relevant protocol(s) performed.text{text}filtering of seawater, storing samples in ethanolnucleic acid sequence sourceX-XCONFLICT1MIXS:0000016Problem: See comments re storage conditions and treatment: we may propose to expand these into multiple fieldsProblem: SG and capture data contain many microbes, not just studied microbeProblem: See comments re storage conditions and treatment: we may propose to expand these into multiple fields
26
size_fracsize fraction selectedFiltering pore size used in sample preparationfilter size value range{float}-{float} {unit}0-0.22 micrometernucleic acid sequence sourceX-XCONFLICT1MIXS:0000017Problem: SG and capture data contain many microbes, not just studied microbe
27
samp_sizeamount or size of sample collectedThe total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected.measurement value{float} {unit}5 liternucleic acid sequence sourceM-MCONFLICTmillliter, gram, milligram, liter1MIXS:0000001Problem: Require a value for 'no detection' - calculus can be unweighableCondition: sample has been sequenced beforeProblem: Require a value for 'no detection' - calculus can be unweighable
28
samp_vol_we_dna_extsample volume or weight for DNA extractionVolume (ml) or mass (g) of total collected sample processed for DNA extraction. Note: total sample collected should be entered under the term Sample Size (MIXS:0000001).measurement value{float} {unit}1500 milliliternucleic acid sequence sourceMMMMmillliter, gram, milligram, square centimeter1MIXS:0000111Problem: Require a value for 'no detection' - calculus can be unweighableGuidance: mass of tissue used for extractionProblem: Require a value for 'no detection' - calculus can be unweighable
29
virus_enrich_apprvirus enrichment approachList of approaches used to enrich the sample for viruses, if anyenumeration[filtration|ultrafiltration|centrifugation|ultracentrifugation|PEG Precipitation|FeCl Precipitation|CsCl density gradient|DNAse|RNAse|targeted sequence capture|other|none]filtration + FeCl Precipitation + ultracentrifugation + DNAsenucleic acid sequence source-?-CONFLICTModified meaning: any enrichment approached used on dataset, not just for viruses
30
nucl_acid_extnucleic acid extractionA link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a samplePMID, DOI or URL{PMID}|{DOI}|{URL}https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdfsequencingMMMM1MIXS:0000037
31
nucl_acid_ampnucleic acid amplificationA link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acidsPMID, DOI or URL{PMID}|{DOI}|{URL}https://phylogenomics.me/protocols/16s-pcr-protocol/sequencingMCMCONFLICT1MIXS:0000038Condition: PCR-based study
32
lib_reads_seqdlibrary reads sequencedTotal number of clones sequenced from the librarynumber of reads sequenced{integer}20sequencing-M-CONFLICTGuidance: number of reads sequenced, not clones
33
lib_layoutlibrary layoutSpecify whether to expect single, paired, or other configuration of readsenumeration[paired|single|vector|other]pairedsequencingMMMM1MIXS:0000041
34
lib_screenlibrary screening strategySpecific enrichment or screening methods applied before and/or after creating librariesscreening strategy name{text}enriched, screened, normalizedsequencingCMCCONFLICT1MIXS:0000043Question: there is already a field for this in general upload. Do we need duplicate entries?
35
target_genetarget geneTargeted gene or locus name for marker gene studiesgene name{text}16S rRNA, 18S rRNA, nif, amoA, rposequencing-C-CONTEXT SPECIFICCondition: PCR-based study
36
target_subfragmenttarget subfragmentName of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNAgene fragment name{text}V6, V9, ITSsequencing-C-CONTEXT SPECIFICCondition: PCR-based study
37
pcr_primerspcr primersPCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase lettersFWD: forward primer sequence;REV:reverse primer sequenceFWD:{dna};REV:{dna}FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAATsequencing-C-CONTEXT SPECIFICCondition: PCR-based study
38
midmultiplex identifiersMolecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase lettersmultiplex identifier sequence{dna}GTGAATATsequencingMXMCONFLICT1MIXS:0000047
39
adaptersadaptersAdapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase lettersadapter A and B sequence{dna};{dna}AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGATsequencingMXMCONFLICT1MIXS:0000048Problem: adapters meant to be removed already if uploading to ENA; but can we trust that? Problem: adapters meant to be removed already if uploading to ENA; but can we trust that?
40
pcr_condpcr conditionsDescription of reaction conditions and components of PCR in the form of 'initial denaturation:94degC_1.5min; annealing=...'initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cyclesinitial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cyclesinitial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35sequencing-C-CONFLICTCondition: PCR-based study
41
seq_methsequencing methodSequencing machine used. Where possible the term should be taken from the OBI list of DNA sequencers (http://purl.obolibrary.org/obo/OBI_0400103).Text or OBI{termLabel} {[termID]}|{text}454 Genome Sequencer FLX [OBI:0000702]sequencingMMMM1MIXS:0000050
42
chimera_checkchimera check softwareTool(s) used for chimera checking, including version number and parameters, to discover and remove chimeric sequences. A chimeric sequence is comprised of two or more phylogenetically distinct parent sequences.name and version of software, parameters used{software};{version};{parameters}uchime;v4.1;default parameterssequencing-C-CONTEXT SPECIFICCondition: performed chimera assessment
43
tax_identtaxonomic identity markerThe phylogenetic marker(s) used to assign an organism name to the SAG or MAGenumeration[16S rRNA gene|multi-marker approach|other]other: rpoB genesequencingCXCCONTEXT SPECIFIC1MIXS:0000053Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
44
assembly_qualassembly qualityThe assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimatedenumeration[Finished genome|High-quality draft genome|Medium-quality draft genome|Low-quality draft genome|Genome fragment(s)]High-quality draft genomesequencingCCCCONTEXT SPECIFIC1MIXS:0000056Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: assembly dataCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
45
assembly_nameassembly nameName/version of the assembly provided by the submitter that is used in the genome browsers and in the communityname and version of assembly{text} {text}HuRef, JCVI_ISG_i3_1.0sequencingCCCCONTEXT SPECIFIC1MIXS:0000057Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: assembly dataCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
46
assembly_softwareassembly softwareTool(s) used for assembly, including version number and parametersname and version of software, parameters used{software};{version};{parameters}metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwisesequencingCCCCONTEXT SPECIFIC1MIXS:0000058Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: assembly dataCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
47
annotannotationTool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submittername of tool or pipeline used, or annotation source description{text}prokkasequencingCX/CCCONTEXT SPECIFIC1MIXS:0000059Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: annotated assemblyCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
48
number_contignumber of contigsTotal number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViGvalue{integer}40sequencingCCCCONTEXT SPECIFIC1MIXS:0000060Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: assembly dataCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
49
feat_predfeature predictionMethod used to predict UViGs features such as ORFs, integration site, etc.names and versions of software(s), parameters used{software};{version};{parameters}Prodigal;2.6.3;default parameterssequencingCX/CCCONTEXT SPECIFIC1MIXS:0000061Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: assembly dataCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
50
ref_dbreference database(s)List of database(s) used for ORF annotation, along with version number and reference to website or publicationnames, versions, and references of databases{database};{version};{reference}pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975sequencingCX/CCCONTEXT SPECIFIC1MIXS:0000062Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: assembly dataCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
51
sim_search_methsimilarity search methodTool used to compare ORFs with database, along with version and cutoffs usednames and versions of software(s), parameters used{software};{version};{parameters}HMMER3;3.1b2;hmmsearch, cutoff of 50 on scoresequencingCX/CCCONTEXT SPECIFIC1MIXS:0000063Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: assembly dataCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
52
tax_classtaxonomic classificationMethod used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomesclassification method, database name, and other parameters{text}vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters)sequencingCX/CCCONTEXT SPECIFIC1MIXS:0000064Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of taxonomic classification are not observable in raw data upload; is this a necessary field?Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
53
16s_recover16S recoveredCan a 16S gene be recovered from the submitted SAG or MAG?boolean{boolean}yessequencingCX/CCCONTEXT SPECIFIC1MIXS:0000065Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: 16S rRNA dataCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
54
16s_recover_software16S recovery softwareTools used for 16S rRNA gene extractionnames and versions of software(s), parameters used{software};{version};{parameters}rambl;v2;default parameterssequencingCX/CCCONTEXT SPECIFIC1MIXS:0000066Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: 16S rRNA data; Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
55
trnasnumber of standard tRNAs extractedThe total number of tRNAs identified from the SAG or MAGvalue from 0-21{integer}18sequencingC-CCONTEXT SPECIFIC1MIXS:0000067Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
56
trna_ext_softwaretRNA extraction softwareTools used for tRNA identificationnames and versions of software(s), parameters used{software};{version};{parameters}infernal;v2;default parameterssequencingC-CCONTEXT SPECIFIC1MIXS:0000068Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
57
compl_scorecompleteness scoreCompleteness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scoresquality;percent completeness[high|med|low];{percentage}med;60%sequencingCX/CCCONTEXT SPECIFIC1MIXS:0000069Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
58
compl_softwarecompleteness softwareTools used for completion estimate, i.e. checkm, anvi'o, busconames and versions of software(s) used{software};{version}checkmsequencingCX/CCCONTEXT SPECIFIC1MIXS:0000070Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
59
compl_apprcompleteness approachThe approach used to determine the completeness of a given genomic assembly, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genometext[marker gene|reference based|other]other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83)sequencingCX/CCCONTEXT SPECIFIC1MIXS:0000071Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
60
contam_scorecontamination scoreThe contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databasesvalue{float} percentage1.00%sequencingCX/CCCONTEXT SPECIFIC1MIXS:0000072Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
61
contam_screen_inputcontamination screening inputThe type of sequence data used as inputenumeration[reads| contigs]contigssequencingCX/CCCONTEXT SPECIFIC1MIXS:0000005Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
62
contam_screen_paramcontamination screening parametersSpecific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmerenumeration;value or name[ref db|kmer|coverage|combination];{text|integer}kmersequencingCX/CCCONTEXT SPECIFIC1MIXS:0000073Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
63
decontam_softwaredecontamination softwareTool(s) used in contamination screeningenumeration[checkm/refinem|anvi'o|prodege|bbtools:decontaminate.sh|acdc|combination]anvi'osequencingCX/CCCONTEXT SPECIFIC1MIXS:0000074Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
64
bin_parambinning parametersThe parameters that have been applied during the extraction of genomes from metagenomic datasetsenumeration[homology search|kmer|coverage|codon usage|combination]coverage and kmersequencingCCCCONTEXT SPECIFIC1MIXS:0000077Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
65
bin_softwarebinning softwareTool(s) used for the extraction of genomes from metagenomic datasets, where possible include a product ID (PID) of the tool(s) used.names and versions of software(s) used{software};{version}{PID}MetaCluster-TA (RRID:SCR_004599), MaxBin (biotools:maxbin)sequencingCCCCONTEXT SPECIFIC1MIXS:0000078Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
66
reassembly_binreassembly post binningHas an assembly been performed on a genome bin extracted from a metagenomic assembly?boolean{boolean}nosequencingCCCCONTEXT SPECIFIC1MIXS:0000079Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
67
mag_cov_softwareMAG coverage softwareTool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasetsenumeration[bwa|bbmap|bowtie|other]bbmapsequencingCCCCONTEXT SPECIFIC1MIXS:0000080Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: performed taxonomic binning; Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
68
vir_ident_softwareviral identification softwareTool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs usedsoftware name, version and relevant parameters{software};{version};{parameters}VirSorter; 1.0.4; Virome database, category 2sequencingC-CCONTEXT SPECIFIC1MIXS:0000081Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
69
pred_genome_typepredicted genome typeType of genome predicted for the UViGenumeration[DNA|dsDNA|ssDNA|RNA|dsRNA|ssRNA|ssRNA (+)|ssRNA (-)|mixed|uncharacterized]dsDNAsequencingC-CCONTEXT SPECIFIC1MIXS:0000082Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
70
pred_genome_strucpredicted genome structureExpected structure of the viral genomeenumeration[segmented|non-segmented|undetermined]non-segmentedsequencingC-CCONTEXT SPECIFIC1MIXS:0000083Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
71
detec_typedetection typeType of UViG detectionenumeration[independent sequence (UViG)|provirus (UpViG)]independent sequence (UViG)sequencingC-CCONTEXT SPECIFIC1MIXS:0000084Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
72
otu_class_apprOTU classification approachCutoffs and approach used when clustering “species-level” OTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside OTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysiscutoffs and method used{ANI cutoff};{AF cutoff};{clustering method}95% ANI;85% AF; greedy incremental clusteringsequencingCXCCONTEXT SPECIFIC1MIXS:0000085Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
73
otu_seq_comp_apprOTU sequence comparison approachTool and thresholds used to compare sequences when computing "species-level" OTUssoftware name, version and relevant parameters{software};{version};{parameters}blastn;2.6.0+;e-value cutoff: 0.001sequencingCXCCONTEXT SPECIFIC1MIXS:0000086Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
74
otu_dbOTU databaseReference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in "species-level" OTUs, if anydatabase and version{database};{version}NCBI Viral RefSeq;83sequencingCXCCONTEXT SPECIFIC1MIXS:0000087Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Problem: results of analysis are not observable in raw data uploadCondition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
75
host_pred_apprhost prediction approachTool or approach used for host predictionenumeration[provirus|host sequence similarity|CRISPR spacer match|kmer similarity|co-occurrence|combination|other]CRISPR spacer matchsequencingC-CCONTEXT SPECIFIC1MIXS:0000088Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
76
host_pred_est_acchost prediction estimated accuracyFor each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literaturefalse discovery rate{text}CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048)sequencingC-CCONTEXT SPECIFIC1MIXS:0000089Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!Condition: if assembly performed; only for MAG submission to e.g. MGnify, not raw reads!
77
associated resourcerelevant electronic resourcesA related resource that is referenced, cited, or otherwise associated to the sequence.reference to resource{PMID} | {DOI} | {URL}http://www.earthmicrobiome.org/sequencingCXCCONFLICTmMIXS:0000091
78
soprelevant standard operating proceduresStandard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequencesreference to SOP{PMID}|{DOI}|{URL}http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/sequencingCCCCmMIXS:0000090Condition: assembly data; Problem: results of analysis are not observable in raw data upload
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100