1 of 9

FAIR Datasets, Vocabularies

& Data Stories

Shared Development Roadmap

Q2-demo 2022

2 of 9

FAIR Datasets

NDE Partner

collection(s)

n

n

OAI

SPARQL

CLARIAH

FAIRy

Datasets

INEO

Filter & merge

plus

Linked Data

1

resources

norm

Ingest

facets

Sum-marize

NDE

3 of 9

Facets for FAIR Datasets

  • Current descriptive set ≅ INEO set + FAIR facets
  • FAIR Implementation Profiles
    • Which FAIR facets do we need?
      • F: PID types/links
      • F: Metadata formats
      • F: search engines for (meta)data
      • A: communication protocol for (meta)data
      • A: AAI for (meta)data
      • A: longevity for metadata
      • I: knowledge representation for (meta)data
      • I: vocabularies for (meta)data
      • I: schemas for (meta)data
      • R: license for (meta)data
      • R: provenance for (meta)data
    • Leading to F-A-I-R “stars”, with “enough stars” the dataset can potentially go to INEO
  • F-UJI
    • Assess the FAIRness of our datasets?
    • Technical hints on how to make our FAIRness explicit
    • How can the registry help the partners repository?

4 of 9

Why do we use CMDI as a pivot format?

  • CMDI is an XML-based format where you can combine reusable components into resource type specific profiles
  • It’s in use by CLARIN and 7 of our partners provide their records as CMDI
  • CMDI can easily morph to mimic other metadata formats, e.g. the DCat of NDE
  • There is software for a central catalogue, i.e. the VLO, including an ingest an normalization pipeline
  • Facets can be adapted into the FAIR dataset facets we need
  • Concept Links provide a bridge to the Linked Data world
  • Lots of knowledge and tools in our community
  • CMDI is not a goal/endpoint/result just a (convenient) mean to deal with the heterogeneity in our community

5 of 9

What’s new?

  • Harvester
    • Speak SPARQL to the NDE dataset registry to get the records and turn them into CMDI
  • Resource Summarizer
    • Find used Schemas and Vocabularies
    • Also for closed data? e.g. run internally & provide reports for facets (no sensitive data)
    • On the roadmap for LD, but to be extended for more resource types
  • Registry endpoints
    • OAI for the full records
    • LD for the facets
  • INEO sideloader
    • Merge current metadata with static descriptive info, e.g. screenshots
    • Editor for the descriptive info

6 of 9

FAIR Vocabularies

  • To be FAIR Datasets need to make explicit their
    • Knowledge representation
    • Vocabularies
    • Schemas
    • We use a looser LD view on vocabularies, which includes all of these
      • However, still with a LD focus, which we should try to loose now or in SSHOC.NL
  • CLARIAH helps to
    • Find vocabularies recommended by the community
      • Positive or negative reviews
      • Crawled from a repository/registry, e.g.
      • Popularity with known datasets, as assessed by the summarizer
    • Retain access to vocabulary versions by caching
    • Access vocabularies in a common way by a common API
    • Crosswalk between vocabularies by linksets
    • Alternative hierarchical overlays for vocabularies
  • We did trials with a local copy of Bartoc filled by crawls from Awesome Humanities and YALC
    • Now working on a better suitable data model for our own registry

7 of 9

Data Stories

  1. WP4: The 'Spanish' Flu Epidemic in The Netherlands 1918-19
    1. Hosted by DRUID
  2. WP5: 15 years of the popular Dutch chat show 'DWDD' - in data
    • Hosted by the MediaSuite

Merged their workflows into one

Wrote a spec/schema for the Data Stories, incl. sample instances

Started on the requirements for/design of the editor

Started on open source visualisation plugins, e.g. geo and superset

8 of 9

CLARIAH+ SDR vision

Visualisation module

Jupyter notebook

Data stories editor/ Yasgui

(DANE)

Large Scale Analyses

Authenticated users

Small Scale Analysis

module

Search module

Interactive

Webpage / Enhanced Publication

public git repos

Open datasets

Closed datasets

Open APIs

Closed APIs

User-friendly

FAIR datasets

FAIR tool discovery

Distribution & deployment

Data Stories

Data Stories

Data Stories

Storage/ Archive

runtime

verification

verification

Authenticated users

User-friendly

9 of 9

Data Story editor