Dataset Curation & Publication 2:
Reality vs Theory
PI: Maryann Martone
PRECISION Human Pain Network
Overview of process
Register study metadata with HEAL
Upload data + experimental metadata to SPARC
Publish data
View and search data through HEAL
human whole genome PHI?
BAM or fastq
Yes
No
Work on metadata w/DCIC Team
and provide accession #
Public DOI
First datasets are now available through HEAL via SPARC
https://healdata.org/portal/discovery
https://doi.org/10.26275/ZQCB-QH3L
Journal requirements vs HEAL requirements
Recent example:
Actions
Metadata
Let’s talk about it…
PRECISION Metadata Standard
Data dictionary
Download both the metadata specification (data dictionary) and the template
Metadata template
Curation Team checks it and marks it off if missing
Data dictionary
Metadata template
HEAL CDE requirements
Required optional
Subject 1
Sample 1
Dataset 1
Sample 2
Dataset 2
Complete Metadata: Required for Acceptance
Metadata 3.0?
Best practices: HEAL & PRECISION Metadata Standards
Current Standard: Version 2.0
Comprehensive metadata collection at the point of acquisition prevents costly gaps that cannot be filled retrospectively.
Metadata 3.0?
Metadata Header Standardization: Critical for Data Usability
Why Standardization Matters
Best Practices
Common Pitfalls
"In metadata, precision isn't just a virtue—it's a requirement for scientific reproducibility."
Best practice: Consult the data dictionary
HEAL requirement: Data dictionary
Dataset publishing expectations
The Curation Team works on multiple submissions simultaneously
Contributors
Curation Team
ONCE PROTOCOL HAS BEEN SUBMITTED
⍏It can take up to 3 days for DOI to resolve
ONCE ALL DATA, METADATA HAS BEEN SUBMITTED
Provide INITIAL feedback to contributors
PI responds
Dataset publishing expectations
Summary
Extra of slides
Dataset publishing expectations
What happens when the dataset is not public
Dataset publishing tips
https://doi.org/10.26275/pxwy-sric
Dataset publishing tips
Publish data through
SPARC Portal
Upload data + experimental metadata to SPARC
View and search data through HEAL
metadata will be findable on the
HEAL platform,
but published on the
SPARC Portal (https://sparc.science/)
Data on
Pennsieve
Protocols
on protocols.io
Use SODA to prepare data for upload
SPARC Data Submission Steps
Dataset publishing tips