1 of 10

Leveraging Structured Data on the Web to Address FAIR Principles

Doug Fils

Ronin Institute

2 of 10

Topics

References:

Theme:

Leverage web architecture, commodity tooling, and semantics as the platform for publishing and consuming metadata/data. #useTheWeb

Keywords:

3 of 10

Structured Data

Metadata serialized in JSON(-LD)

  • Context
    • Vocabularies: Schema.org DCAT, GeoSPARQL, PROV, Units and measurements
  • Representation
    • JSON-LD (RDF with tooling)

Profile Communities provide guidance on implementation

  • Science on Schema (geo)
  • BioSchema (bio)
  • Darwin Core
  • DCAT Profiles
  • etc.

Benefits

  • Common tooling and skill sets
  • RDF data model with multiple representations
  • Data servers with multiple functional goals
    • Read or Write optimize
    • Text
    • Spatial
    • Temporal
    • etc

4 of 10

Data on the web (the web is your architecture)

Communication is via web architecture with different proposed levels (work in progress with CODATA).

  • L1: Structured data on the web with JSON-LD encoded RDF in script tags accessed by HTTP. One option here is the use of content negotiation to obtain the JSON-LD directly without the need for the consumer to parse it. Included in this are robots.txt and sitemap.xml utilization for resource location.
  • L2: Signpost style link headers in the HTML (or HTTP). This would allow the addition links with relations based on IANA Link headers for the list. Valid terms of interest might be: described by, describes, profile,
  • L3: Content Negotiation by Profile

One of the main reason for this work is to explore the scaling of the web architecture to large collections of resources such as samples.

Note, HTTP 2.0 compressed headers, multiplexing, server push and full duplex communications are also being explored to address scaling and incremental indexing.

5 of 10

Implementations

Socio-Technical:

Implementation is really a scio-technical processes.

The socio part being alignment to policy and procedure in terms of data schema (profiles) and publication (web arch).

Google Dataset Search

NSF GeoCODES (DeCODER)

UNESCO ODIS Ocean InfoHub

POLDER Polar Data Discovery Enhancement Research

Canadian Consortium for Arctic Data Interoperability

Internet of Water

WIFIRE

Helmholtz Germany

Australian Research Data Commons

Some of the communities using this approach

6 of 10

Implementations

GleanerIO (https://github.com/gleanerio) as an implementation if this web architecture based model. It is not the only one and given the web architecture base, it is relatively easy to build workflows to leverage it.

Set of containers (OCI) deployed in a orchestration environment (docker or kubernetes) either local or in the cloud.

7 of 10

Segway to FAIR

We can view these implementation networks in terms of GoFAIR concepts.

The groups can be inspected by personas, let's use these:

Publisher

Indexer

User & Community

So these can be easily seen as FAIR Implementation Neworks and also relate to the FAIR Implementation Profiles

The whole process is a continuous workflow of interaction between the various personas.

8 of 10

FAIR Digital Objects (Framework)

In FDOF, we have an identifier record named FDOF's Identifier Record (FDOF-IR), a specific type of metadata, containing information about: the object's type; the object's metadata record(s); and the object's location(s). More details on this can be found in FAIR Digital Object Framework Documentation

From the FDO – Kernel Attributes & Metadata Version 2.0 we can extract the required items as denoted by a required cardinality of 1 or more.

Above image is a draft of aligning FDO required properties to schema.org types a properties.

Required Properties: PID, KernelInformation Profile, digitalObjectType, digitalObjectLocation, digitalObjectPolicy, Etag, dateCreated

9 of 10

Integration: CODATA, DeCODER & Ocean InfoHub

CODATA (Committee on Data of the International Science Council (ISC))

Cross Domain Interoperability Framework (CDIF)

Information Exchange at the Application Level (only part of the CDIF approach)

  • Data Discovery and Assessment
  • Data Access
  • Data Integration and Reuse
    • Core elements for integration include:
      • Space
      • Time
      • Units and variables

Alignment can also be aided by PIDs (DIDs) or content based addressing (SHAs).

More for KG alignment than profile alignment, perhaps.

Possible pattern to support integration approaches leveraging SHACL (validation) and JSON-LD Frames (alignment) for data graphs.

10 of 10

Thanks