Guidelines & Tips: Uploading Data to AmphibianDisease.org
(updated July 2021, MK)
Link to this page: https://bit.ly/ADP_DataTips
The easiest way to get started is to download the data template (XLSX file) PRIOR to data collection to become familiar with expected and best data practices.
Follow the directions here: https://amphibiandisease.org/contributing to create a project or skip to Download the Template: https://amphibiandisease.org/contributing/#template-data
If you do not have a project, you can also choose the Amphibian Disease project on GeOME which is a public project and has templates anyone can download.
Quick links to content below:
Diving into the Samples of the xlsx template
If your project uses museum specimens
If your project is based on field-caught or captive individuals
Excel Tip: Dealing with Dates
Excel Tip: Trailing spaces
Troubleshooting Geome-DB
Open Refine Tools
Keep reading for specific use cases and how to use specialized data templates.
Tip: you can generate your own template from the AmphibiaWeb Template generator, see any field definition by clicking on DEF
See below for preselected templates for specific types of datasets that are available for downloading.
Also, be sure to read GeOME’s FAQ page
The rest of this document will help you navigate specific fields, the requirements and best practices to make your data as useful as possible:
Diving into the Samples of the xlsx template
- First tip: fields in red are required and need to be filled on the Samples tab. The fields in Black are optional: that does not mean they are not important (please dont ignore them!). In many cases, depending on the project, they are just as critical but they simply are not used in everyone’s project. See basisofRecord below.
- materialSampleID - this can be confusing; it simply has to a completely unique alphanumeric without any hyphens or slashes. You can find and replace those characters with underscores. Typically you already have this information as another name for lab or field identifiers, as long as it is unique.
- These non-URI characters cannot be used: hyphens (-), slash (/), commas ( , ), spaces ( ), pipes (|) but these are allowed in other fields as separators
- You can replace with underscores ( _ ), which are allowed
- coordinateUncertaintyInMeters - if the coordinates are from a GPS and you do not have the error at the time of recording it, simply use ‘30’ meters. If the coordinates are from your field work, you could also use Google Maps and the measure functions (from right-clicking on the map) to estimate.
- Taxonomy: genus, specificEpithet. These are required and should follow AmphibiaWeb’s taxonomy. If older names are used, then we will check and replace with its synonyms where possible. We do not ask for other higher taxonomic names such as Family, Order, Class as we fill that in during data migration to the web services. If you are unsure of spelling, you can look it up on AmphibiaWeb. Contact amphibiaweb@berkeley.edu if you have any questions as we improve the taxonomy list constantly.
- taxonRemarks - may be used to indicate a missing specificEpithet (e.g., “species unknown”) or if only the identifiable to Family (e.g., “Ranidae”) or synonyms.
- individualCount and diseaseTestedPositiveCount (optional)
- Most of the time the default of the first field will be 1 for a single swabbed individual. There may be groups or lots of tadpoles where the specific swab or sample is a combination of individuals, which is when this may be higher. The second field usually defaults to whether diseaseDetected = TRUE or FALSE (1 or 0) or may be higher in the case of lots. Together these fields may be used by researchers aggregating data to calculate prevalence.
- collectorList - use pipes ( | ) between series of names, not commas, to separate
If your project uses museum specimens,
then basisofRecord = PreservedSpecimen. If so, then the following optional fields are critical to linking to the original source of data:
- institutionCode - this is the museum’s acronym or abbreviation. If you dont know it you can look it up
- Example: California Academy of Science = CAS
- Example: Museum of Vertebrate Zoology = MVZ
- collectionCode - this is the type of collection, and typically is 3-4 letters.
- Example: HERP
- catalogNumber - this is a number which identifies the specimen within the collection
- Example: 12345
Together these fields can be used as a global identifier (also called ‘Darwincore triple’) allowing the disease results to link to other databases (e.g., GBIF)
Example: MVZ:HERP:12345
Which in turn, now can form a url:
https://arctos.database.museum/guid/MVZ:HERP:12345
(try it!)
Data Template to download: under Template config, use the dropdown to select museumSpecimen_samples for a preconfigured XLS workbook
If your project is based on field-caught or captive individuals,
then basisofRecord = LivingSpecimen In field work, the location information becomes even more critical as you may not have a record of it elsewhere. For captive individuals, measurements may be critical. These fields may be useful:
- locality - already required
- locationID - some projects may have designated sites or traplines with their own IDs;
- locationRemarks - other explanation may be needed to annotate the locationID (eg. within a trapline, there may be more than one GPS reading available, like endpoints)
- habitat - this is a controlled vocabulary list (see the tab Lists for the acceptable terms); if you choose not to use you can use eventRemarks for any habitat notes
- eventRemarks - free text description of any site comments
- establishmentMeans = values should be: wild or captive
- specimenDisposition = released (other options are collected, preserved, captive, captive-preserved)
- weight, weightUnits, length (snout-vent not tail!), lengthUnits - these may be used if recorded in the field
- occurrenceRemarks = use to record any other measurements or notes about the specimen, like lesions noted
Data Template to download: under Template config, use the dropdown to select fieldSpecimen_samples for a preconfigured XLS workbook
- Tip: if you are not going to use a field in BLACK, which are optional, just delete it to trim your Samples sheet! Be sure not to delete any fields in RED. You can also set this on Geome when you generate new spreadsheets, by only checking fields you want to include in your template, then saving the template to your project.
EXCEL TIP: Dealing with Dates or how to parse into Year, Month Day collected?
Often the verbatim date field will have something that Excel can recognize and format as a date field, eg. 1/5/2020
It is fine to keep it in the format it was originally entered (hence verbatimEventDate)
However ideally we will separate out that field into: yearCollected, monthCollected, dayCollected
To do this you can insert a few blank columns next to the date field to rework the data. The final product will look something like this-- where the original date is in col T and new fields are inserted to the left R and S which have been prepared so only day and month values are displayed (there is another column just for the yearCollected but it is not shown):
How to do this?
Step1: in the blank column, in this example column S, use this formula to show only the day
=day(T2)
If you hit return and it works correctly, you should see the number “28”
Step 2: Copy that cell and then Paste Special as a Formula for the rest of the cells for the rows with data.
Step 3: you are not finished yet! Now you must convert Col S into a field with values and not formulas. The easiest way to do that is to click on the letter S to select the whole column then Paste Special as Values in a fresh column. Clean up/delete the column with the formulas and ensure the field with Values is named dayCollected
You are done! Repeat with column R using this formula for monthCollected:
=month(T2)
Repeat one more time so that you have a column for yearCollected using this formula:
=year(T2)
EXCEL TIP: Trailing Spaces, and how to remove them
A trailing space in a spreadsheet is cell value that starts with a space (not a letter or number) which can then mislead the database into thinking it is a unique value. For the most part the database will remove these when uploading the data but trailing spaces can interfere with your troubleshooting and cleaning of data before you upload. You can remove these in a column with these steps:
- Insert a blank column next to your target column (e.g., B1)
- In the blank column (e.g., C1), use this formula
=TRIM(B1)
- This new formula will copy the values of the B column without the trailing spaces; copy it for the whole column. Then copy the C column values and paste into either the B column or a new one with Paste Special and choose VALUES. Then clean up (i.e., delete the extra columns). You’re done.
Troubleshooting Geome-DB
Error in fields when validating or uploading datasets-- you are sure that you have all the required fields and there are no null (empty) fields. What else could it be?
Field names are case sensitive so check the field in the Error message to make sure if matches the Template
(e.g., Country capitalized will give an error but country lowercase will be fine!) (https://geome-db.org/workbench/template).
Open Refine Tools
Using OpenRefine to clean messy data will be a life-changer if you are sick of Excel!
Install OpenRefine.
Check out OpenRefine Tutorial for a head start or refresher.
Duplicate Facet
materialSamplesID must be a unique value!
Use Duplicate Facet to check this column:
The duplicates facet allows you to narrow your data based on whether or not the value in a particular column is unique.
To use this facet:
- Go to the column materialSamplesID and click the arrow button on the column header.
- Choose “Facet” and then select “Customized facets.”
- Under “Customized facets,” select “Duplicates facet” and a window will appear in the pane to the left side of the grid view.
Results:
- The “Duplicates facet” function returns true or false values where “true” means the value is not unique and “false” means the value is unique.
- This function is useful for isolating records with the same value. Click on “true” to see all the non-unique values. Then you can append an underscore and sequential number to make sure you have only unique values. Example: 1234 (repeated twice) can become 1234_1, 1234_2. Use underscores as hyphens are not allowed!