Introduction to �Data Curation �in the Humanities:
Managing Tabular Data
Tierney Gleason
Reference & Digital Humanities Librarian
tgleason11@fordham.edu
Updated December 16, 2022
What is the point of this workshop?
Proper understanding of digital scholarship requires an acknowledgement of its entropic nature; the absence of forward planning implies a misunderstanding of the object being produced at a fundamental – perhaps ontological – level.
James Smithies, Carina Westling, Anna-Maria Sichani, Pam Mellen, and Arianna Ciula. "Managing 100 Digital Humanities Projects: Digital Scholarship & Archiving in King's Digital Lab." DHQ: Digital Humanities Quarterly 13, no. 1 (2019).
What are data in the humanities
Image by Leimenide via Flickr Commons under a CC BY-SA 2.0 license https://creativecommons.org/licenses/by-sa/2.0/�
The NEH defines humanities data as “materials generated or collected during the course of conducting research.”
Examples could include:
Humanities data per the DH Curation Guide:
1 Julia Flanders and Trevor Muñoz, "Introduction to Humanities Data Curation." DH Curation Guide. Accessed September 23, 2019. https://guide.dhcuration.org/contents/intro/.
Digital vs. Data Curation
Digital Curation → Involves maintaining, preserving and adding value to digital research data throughout its lifecycle.
Data Curation → Expands upon the idea of digital curation to include “capturing and preserving not only the data itself, but information about the methods by which it was produced.”
Data curation in the humanities combines:
2 Julia Flanders and Trevor Muñoz, "Introduction to Humanities Data Curation." DH Curation Guide. Accessed September 23, 2019. https://guide.dhcuration.org/contents/intro/.
Image credit: Absolutely Free Clipart
The end goal of data curation:
3 Trevor Muñoz and Allen Renear, "Issues in Humanities Data Curation." Center for Informatics Research in Science and Scholarship (CIRSS), University of Illinois, Urbana-Champaign, 2011. Accessed cirss.ischool.illinois.edu/paloalto/whitepaper/premeeting/.
Data curation & interpretation example
→ Data transcription of all those living at St. John’s College during the Seventh Census of the United States, 1850
1850 United States Federal Census Seventh Census of the United States, 1850; (National Archives Microfilm Publication Census Place: West Farms, Westchester, New York; Roll: M432_615; Pages: 288A – 290A; Records of the Bureau of the Census, Record Group 29; National Archives, Washington, D.C.
Example cont’d
Images : Courtesy of Fordham University Libraries – Archives and Special Collections via the Internet Archive
Getting Started
with
Data Curation
Image by cuatrok77 via Flickr Commons under a CC BY-SA 2.0 license https://creativecommons.org/licenses/by-sa/2.0/�
Copyright & Data Curation
As you mix original research with existing sources when curating data, be mindful of copyright considerations:
Keep a Project Notebook
Tips for Structuring Tabular Data
Assigning identifiers
For records 1-10 → 01-10
For records 1-100 → 001-100
Structuring header row of columns
Organizing Cells
Develop a controlled vocabulary
Consistency is key when structuring data.
Image via Openclipart.org
Avoid punctuation & symbols
# $ / : , * @ % [ } ! ?
Formatting numbers & dates
YYYY-MM-DD 󠆈 2019-09-24
Avoid color coding
Managing blank cells
UTF-8 Encoding
One table per sheet
File naming conventions
Versioning & write-protection
NYCtheatresHarrison_20190924v5.csv
Data storage
Ensure research data is backed up in more than one place:
Preparing Humanities Data for Publication
Image by cuatrok77 via Flickr Commons under a CC BY-SA 2.0 license https://creativecommons.org/licenses/by-sa/2.0/�
Preparing files for publication
Data documentation for repositories
Examples of data documentation found in repositories:
Humanities Data Curation Record (HDCR)
HDCR supports reuse & reproducibility by:
4 Thomas Padilla and Brandon Locke, "Humanities Data Curation Record." GitHub. Last modified July 5, 2017. Accessed September 23, 2019. https://github.com/datapraxis/hdcr.
HDCRs aim to help researchers:
5 Thomas Padilla and Brandon Locke, "Humanities Data Curation Record." GitHub. Last modified July 5, 2017. Accessed September 23, 2019. https://github.com/datapraxis/hdcr.
Use Project Notebook Content for an HDCR
What would you add to the HDCR template?
6 Thomas Padilla and Brandon Locke, "Humanities Data Curation Record." GitHub. Last modified July 5, 2017. Accessed September 23, 2019. https://github.com/datapraxis/hdcr.
Publishing research data
Recommended data repositories
We hope this workshop was helpful!
Image via Openclipart.org