1 of 19

Data in humanities and social sciences

Benito Trollip

8-10 March 2023

DH-Ignite

Lord Charles Hotel, Somerset West

benito.trollip@nwu.ac.za

License: CC BY 4.0

Funded by:

2 of 19

STRUCTURE 

  • Introduction
  • What is data?
  • What is metadata?
  • Sharing your data
  • What is open science / open data?
  • Where can data be stored?
  • Licensing
  • What is a repository?
    • SADiLaR’s repository and FAQs

3 of 19

INTRODUCTION

  • Reasons for data awareness and management
    • Value of data
    • Labour intensive to produce data
    • Data can be seen as a form of output
    • Reusability
    • Advancing the field

4 of 19

What is data?

5 of 19

WHAT IS DATA? [2]

  • Consider this definition by Harrower et al. (2020):

“We could then define data in the humanities broadly as all materials and assets scholars collect, generate and use during all stages of the research cycle.”

6 of 19

WHAT IS DATA? [3]

  • Data could therefore include:
    • Datasets
      • Corpora, wordlists, frequency lists
      • Interviews, qualitative questionnaire answers
    • Methodology and process
      • Code, methods used, workflow
    • Application(s)
      • Executable files or tools

7 of 19

What is metadata?

8 of 19

SHARING YOUR DATA:�FAIR + CARE principles

  • Findable
  • Accessible
  • Interoperable
  • Reusable

+

  • Collective benefit
  • Authority to control
  • Responsibility
  • Ethics

9 of 19

Box 2 in Wilkinson et al. (2016)

10 of 19

FAIR Data Self Assessment Tool

  • Developed by the Australian Research Data Commons (ardc.edu.au)
  • Assesses your own data’s FAIRness
    • Where the data is available
    • In what type of repository
    • Restriction to the specific platform
    • Questions concerning metadata
  • Freely available here

11 of 19

WHAT IS OPEN SCIENCE / OPEN DATA?

  • Consider this definition by Hampton et al. (2015):

“Under the principles of open science, data are generated with the expectation of unfettered public dissemination. This fundamental shift in thinking from “I own the data” to “I collect and share the data on behalf of the scientific community and society” is essential to the transparency and reproducibility of the open science framework. When data are available, discoverable, reproducible, and well-described, scientists can avoid “reinventing the wheel” and instead build directly on those products to innovate.”

12 of 19

Is open data always a must?

13 of 19

LICENSING OF DATA

  • Different licensing options available
  • Creative Commons licensing is an option
    • Different types of CC licences
    • Specify terms of use
      • CC0
      • CC-BY
      • CC-BY-SA
      • CC-BY-ND
      • CC-BY-NC
      • CC-BY-NC-SA
      • CC-BY-NC-ND

14 of 19

WHERE AND HOW CAN DATA BE STORED?

  • Analogue - in a steel drawer? With a key?
  • Semi-digitally - a USB or CD-ROM
  • Any other ideas?

15 of 19

THE MANAGING AND STEWARDSHIP OF DATA

  • To have control on the content and quality of data is crucial
  • Data Management Plans becoming a requirement for funding and publication
  • Data wrangling skills becoming an imperative for researchers beyond library science (Koltay, 2017)
  • Sensitivity about accurate metadata is growing (Wiley, 2014)

16 of 19

What is a repository?

17 of 19

SADiLaR’s REPOSITORY

  • Specialist repository with different options
  • SADiLaR’s repository
    • Other NLP/ML outputs are there
    • Findable, also on Google Scholar
    • Linked to CLARIN
    • Open-source, free to access
  • Steps to submit to the repository are available here

18 of 19

FAQs

  • Can only people at academic institutions submit to the repository?
  • Is the repository only for datasets / tools?
  • What happens when the submitter / contact person leaves an institution?
  • Do you need to upload a file to upload a resource?
  • Can I edit my own metadata?
  • Do you quality check the data submitted to the repository? Who does it?

19 of 19

THANK YOU FOR YOUR ATTENTION.

PRESENTATION AVAILABLE AT http://bit.ly/3kFiLvv

PLEASE FEEL FREE TO GET IN TOUCH:

benito.trollip@nwu.ac.za

info@sadilar.org