1 of 22

FAIR Assessment in CLARIAH

the interim score

Menzo Windhouwer – Humanities Cluster Digital Infrastructure, KNAW

QiQing Ding – Humanities Cluster Digital Infrastructure, KNAW

Wilko Steinhoff - DANS, KNAW

2 of 22

metadata on FAIR tools & datasets for INEO

  • tools:
    • only partially available, often blended with datasets
    • go get it at the source!
      1. Maarten van Gompel
        • requirements and guidelines
          • allow us to assess the FAIRness of tools
        • harvesting pipeline
  • datasets:
    • two major groups of partners in CLARIAH:
      • CLARIN partners which are B or C centers in the european CLARIN network, which provide metadata to CLARIN's central catalogue: the Virtual Language Observatory
      • Netwerk Digitaal Erfgoed (NDE) partners which provide metadata to the NDE dataset registry

    • how to assess the FAIRness of these datasets?
      • create a FAIR Implementation Profile (FIP)
        • for CLARIAH as a whole: 2 years ago it was too late to do that in the timeframe of CLARIAH+
        • or for CLARIN & NDE, and find their common denominators

3 of 22

4 of 22

5 of 22

6 of 22

7 of 22

CLARIN B Centre requirements

  • CLARIN centre types
    • B: meet all the (technical) requirements
    • K: provides knowledge & support
    • C: "just" provide metadata

  • To become a B centre one has to:
    • hand in a self assessment of the conformance to the B centre requirements
      • (F) persistent identifiers
      • (F) completeness
      • (A) harvesting
      • (A) single sign-on
      • (I) component metadata
      • (R) IPR
    • which gets reviewed by the CLARIN Assessment Committee
    • next to these CLARIN specific requirements one also has to acquire a Core Trust Seal

8 of 22

9 of 22

10 of 22

FAIR Implementation Profile (FIP)

  • together with Enno Meijers (NDE) under the guidance of Shuai Wang and Angelica Maineri (VU, ODDISSEI, SSHOC-NL) we created a first version of a FIP

CLARIAH-FIP-mini-questionaire

  • CLARIN FIP based on B centre requirements
    • top down, but once upon a time created bottom up
  • NDE FIP resulted mainly in questions
    • very limited number of hard requirements
    • what FAIR means in NDE should come from bottom up!
  • next problem: how to turn the FIP into an actual technical assessment?
    • FAIR is about machine accessibility!
    • there are some generic tools, e.g. F-UJI, which is linked data oriented
      • suitable for NDE, which uses DCat/schema,.org (RDF) for metadata
      • CLARIN uses Component Metadata (XML) as its default for metadata
    • at least agree on an exchange format …
    • but how to prevent we compare apples with pears across communities? 🍎 != 🍐

11 of 22

12 of 22

OSTrails: eu EOSC project started 2024

by Joaquin Lopez - WP4

13 of 22

FAIR assessment based on the CLARIN FIP

  • teamed up with Wilko Steinhoff (DANS) to work ahead of OSTrails (Open Science Plan-Track-Assess Pathways)
    • pyFAT
      • assessment framework as under development by OSTrails
      • express tests in a language suitable for you metadata
        • XPath for XML
        • SPARQL for RDF
        • fallback to python
      • tests assess a metric (=common denominator), which is tied to a FAIR principle
      • scores are combined by a community specific rubric (example)
      • exchange format as under development by OSTrails

  • the interim score
    • we can assess CLARIN datasets using a partial FIP (F & A)
    • not yet NDE, but we are closer to experimenting
    • INEO currently still contains a mix of FAIR assessed and unknown FAIR datasets, but this will change in due course
    • nothing about FAIR assessment is set in stone, its the current frontier!

14 of 22

15 of 22

16 of 22

17 of 22

The pipeline to INEO (for datasets)

CLARIN Partner

NDE Partner

n

n

OAI

SPARQL

Filter & merge

1

Extract & Ingest

facets

18 of 22

19 of 22

20 of 22

21 of 22

22 of 22

Lets exchange war stories 🍻

  • What are datasets vs collections vs digital objects vs resources?
  • How to make an OAI harvester harvest a triple store?
  • Try out a JSON database?
  • Sync 1000s of records with a deadline in sight
  • Add RUC for your dataset(s)

Visit us at

the INEO & FAIR demo & poster

at 15:30 in the Spot