NOAA_MIMARKS.survey.host-associated.6.0_sharing

	A	B	C
1	# v1.0.3
2	This metadata template is for use with marker gene sequence data derived from host-associated environmental samples. It is adapted from the MIMARKS: survey, host-associated package to include Darwin Core terms and terms recommended by NOAA Omics.
3
4	Sheet definitions
5	study_data	Metadata about the study, such as project name, description, funding info, and other project-level metadata required by NCBI and OBIS. This is filled out at the start of a project.
6	sample_data	Contextual data about the samples collected, such as when it was collected, where it was collected from, what kind of sample it is, and what were the properties of the environment or experimental condition from which the sample was taken. Each row is a distinct sample. Most of this information is recorded during sample collection. Many terms have controlled vocabulary, such as organism, env_broad_scale, waterBody. This file contains information that is submitted to NCBI when generating a BioSample. Other important fields for metadata processing include amplicon_sequenced, which helps to link together different types of metdata. This sheet contains terms from the MIMARKS survey host-asociated 6.0 package. For other types of samples (eg, sediment), use the appropriate AOML_MIMARKS.survey.sediment file.
7	prep_data	Contextual data about how the samples were prepared for sequencing. Includes how they were extracted, what amplicon was targeted, how they were sequenced. The 1st section of this file is in the format for an NCBI SRA upload and should NOT be rearranged or renamed. Each row is a separate sequencing library preparation, distinguished by a unique library_id. One sample from sample_prep could be represented multiple times on this sheet if multiple marker genes were amplified.
8	analysis_data	Data about processing from raw sequences to the derived outputs, including software versions, processing parameters, reference database used. Often there is only one row for each amplicon that is sequenced.
9	asv_data	File generated by Tourmaline, containing ASV featureid, DNA sequence, assigned taxonomy, confidence in taxonomy, and read counts for each sample. This file is not stored in the metadata sheet and is not required for submission to NCBI, but is necessary for submission to OBIS. Sample names in the file must match names in the metadata template.
10
11	Workflow	New project	Transferring existing project metadata
12	Project initiation: study_data	Upon initiating a project, create a copy of the NOAA_MIMARKS.survey.host-associated.6.0 Google Sheet (File -> Make a Copy) and save it in the project's Google Drive Metadata folder with the project_id at the start of the file name (eg, gomecc_AOML_MIMARKS.survey.host-associated.6.0). Fill in as much info to `study_data` as is known. You will not have info for 'accessions' until you submit data to NCBI and OBIS.	Create a copy of the AOML_MIMARKS.survey.water.6.0 Google Sheet (File -> Make a Copy) and save it in the project's Google Drive Metadata folder with the project_id at the start of the file name (eg, gomecc_AOML_MIMARKS.survey.water.6.0). Fill in as much info to `study_data` as is known. You will not have info for 'accessions' until you submit data to NCBI and OBIS.
13	Sample collection: sample_data	During sample_collection, record required sample-specific info on a separate local data sheet. Key fields to record include serial_number, line_id, station, ctd_bottle_no, sample_replicate, sample_type, notes_sampling, collection_date_local, depth, decimalLongitude, decimalLatitude, geodeticDatum, samp_vol_we_dna_ext, and any environmental variable not being recorded by others on the cruise. ASAP, transfer these data to your sample_data sheet, with one row for each filtered water sample. Have someone else double check that all values were input correctly and that GPS coordinates and dates fall within the expected range. Use the sample_data to generate unique sample names for each distinct water sample to be DNA extracted.	Transfer sample data from an existing metadata sheet to your sample_data sheet, with one row for each filtered host-associated sample. Have someone else double check that all values were input correctly and that GPS coordinates and dates fall within the expected range. Use the sample_data to generate unique sample names for each distinct host-associated sample to be DNA extracted. Ensure that existing data matches the required formats, and if not then convert them. For example, collection_date should be in UTC time and ISO format.collection_date_local can be in local time, ISO format. For data that does not match columns in the template, create a new column and color it light blue.
14	Post-sample collection: sample_data	If you have biological replicates (eg, distinct water samples taken from the same Niskin bottle), make sure that you record one unique identifier for the full water sample in source_mat_id and list out cooresponding replicates under biological_replicates. Fill in any other sample_data that is known, such as organism, env_broad_scale, env_local scale, env_medium, geo_loc_name, waterBody, samp_collect_device, samp_mat_process, size_frac, collection_method. Many of these are controlled terms and the same between projects.
15	Lab preparation: sample_data	When preparing samples for sequencing in the lab, you will generate other information that should be added to sample_data: amplicon_sequenced, dna_conc, concentrationUnit, and extract_number. You will also add a few more samples that are prepared for sequencing, such as extraction blanks and mock communities. Make sure to select the correct sample_type for these samples.
16	Lab preparation: prep_data	The prep_data sheet is organized with one row for each sequencing library prep. sample_name must match the name in sample_data, while library_id should be the name that was given the sequencing center and should be unique to each sequencing library prep (so different between 16S and 18S preps, for example). amplicon_sequenced must match the name provided to amplicon_sequenced in sample_data. Some of this sheet can be filled out prior to PCR prep, as you will already know the pcr primers and conditions being used. prep_data is split into 2 section after column M. The 1st section mostly contains controlled vocabulary that is submitted the NCBI SRA. Do NOT reorganize or change the column in the 1st section. Make sure to record the date and personnel for DNA extractions and PCRs. You will not have the biosample or sra accessions until after submitting to NCBI SRA.
17	After sequencing; prep_data	Once you receive sequences back, enter the filenames in prep_data for each library prep. Upload all sequences to a google drive location and link that location on the sheet.
18	Analysis: analysis_data	The analysis_data has one row for each amplicon_sequenced. Provide short but descriptive info on software, parameters, and versions used for assembling ASVs and assigning taxonomy. Other types of analyses (such as estimates of diversity) should be provided in a code_repo link.
19	NCBI SRA submission: project_data	Initiate a new NCBI SRA submission. Use the project_data sheet to fill in info for the BioProject if you have not already created a BioProject. I you have an existing BioProject, make sure to add that accession to sample_data.
20	NCBI SRA submission: sample_data	The sample_data sheet will be used directly to create Biosamples in NCBI. We recommend downloading the Google Sheet as an Excel file, then saving the sample_data sheet as it's own excel file. Delete any columns from this file that you do not want on NCBI (such as date_sheet_modified, modified_by, internal notes). Upload the sample_data Excel file.
21	NCBI SRA submission: prep_data	It is easiest to submit SRA data to NCBI in batches based on sequencing run or amplicon_sequenced, but you can also submit allof your data at once. You can submit youWe recommend submitting different markers separaIn the Google Sheet, create a new sheet that is a copy of the SRA_template sheet and name it based on the marker you are
22
23
24	Guidelines
25	Ensure that sample names are consistent between sample_data, prep_data, and asv_data.
26	Ensure that the amplicon name provided in amplicon_sequenced in sample_data, prep_data, and analysis_data are all consistent.
27	Do not reorganize or rename columns in the 1st section of prep_data (before column N)
28	Keep the date_sheet_modified and modified_by as the last 2 columns in each sheet. These are set by the custom onEdit function through Apps Scripts.
29	Do not rename headers. If you wish to provide a custom column,you can add that with a cyan color header.
30	Do not make edits to the SRA Terms sheet, it is used for the data validation in prep_data
31	Saving this file as an Excel file may lose some data validation functionality.
32	For blank cells, NCBI only allows 'not collected', 'not applicable' or 'missing'.
33
34
35	Custom AOML terms
36
37	sample_data
38	cruise_id
39	line_id
40	station
41	ctd_bottle_no
42	sample_replicate
43	source_mat_id
44	biological_replicates
45	extract_number
46	serial_number
47	biosample_accession
48	notes_sampling
49	project_id
50	amplicon_sequenced
51	metagenome_sequenced
52	collection_date_local
53	waterBody
54	decimalLatitude
55	decimalLongitude
56	geodeticDatum
57	dna_conc
58	concentrationUnit
59	sample_type
60	basisOfRecord
61	date_sheet_modified
62	modified_by
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100