1 of 7

FAIR Datasets

update

Shared Development Roadmap

Q1-demo 2022

2 of 7

FAIR Datasets in CLARIAH

3 of 7

User story:

As a scholar, I want to have an overview of datasets providing me with information from collection metadata, including the dataset’s distribution (e.g. SPARQL endpoint, full text search API endpoint, RDF or CSV data dump file) where the dataset is distributed, the organization that publishes the dataset and the license under which it is published, in order to select interesting data sets of my research and access them either via domain portals or central services, by downloading the content myself, or by visiting an organisation.

4 of 7

FAIR Datasets Goals

  • Make datasets findable (metadata) via a Data Registry

  • Make datasets accessible (resource) via domain portals (Media Suite, Linguistic Corpus Search, Nederlab, etc.)

  • Stimulate interoperability and reusability
    • Rank data sets on the basis of e.g., data readiness level, accessibility, availability of end-points/APIs, etc.

5 of 7

6 of 7

FAIR Datasets & CLARIAH

INEO:

  • Inzicht in beschikbare datasets in CLARIAH infrastructuur

SDR:

  • Verzamelen en opslaan van info over beschikbare datasets in CLARIAH (dataset registry)

7 of 7

Status update

  1. Taking inventory of technical means to harvest metadata in CLARIAH+ (overview)
  2. Taking inventory of metadata formats in CLARIAH+ (overview)
  3. Started on a resilient harvesting pipeline based on CLARIN/ODDISSEI current practice (demo)
  4. What does INEO need/expect, and does that (already) matter?