Leveraging Structured Data on the Web to Address FAIR Principles
Doug Fils
Ronin Institute
Topics
References:
Theme:
Leverage web architecture, commodity tooling, and semantics as the platform for publishing and consuming metadata/data. #useTheWeb
Keywords:
Structured Data
Metadata serialized in JSON(-LD)
Profile Communities provide guidance on implementation
Benefits
Data on the web (the web is your architecture)
Communication is via web architecture with different proposed levels (work in progress with CODATA).
One of the main reason for this work is to explore the scaling of the web architecture to large collections of resources such as samples.
Note, HTTP 2.0 compressed headers, multiplexing, server push and full duplex communications are also being explored to address scaling and incremental indexing.
Implementations
Socio-Technical:
Implementation is really a scio-technical processes.
The socio part being alignment to policy and procedure in terms of data schema (profiles) and publication (web arch).
Google Dataset Search |
NSF GeoCODES (DeCODER) |
UNESCO ODIS Ocean InfoHub |
POLDER Polar Data Discovery Enhancement Research |
Canadian Consortium for Arctic Data Interoperability |
Internet of Water |
WIFIRE |
Helmholtz Germany |
Australian Research Data Commons |
Some of the communities using this approach
Implementations
GleanerIO (https://github.com/gleanerio) as an implementation if this web architecture based model. It is not the only one and given the web architecture base, it is relatively easy to build workflows to leverage it.
Set of containers (OCI) deployed in a orchestration environment (docker or kubernetes) either local or in the cloud.
Segway to FAIR
We can view these implementation networks in terms of GoFAIR concepts.
The groups can be inspected by personas, let's use these:
Publisher
Indexer
User & Community
So these can be easily seen as FAIR Implementation Neworks and also relate to the FAIR Implementation Profiles
The whole process is a continuous workflow of interaction between the various personas.
FAIR Digital Objects (Framework)
In FDOF, we have an identifier record named FDOF's Identifier Record (FDOF-IR), a specific type of metadata, containing information about: the object's type; the object's metadata record(s); and the object's location(s). More details on this can be found in FAIR Digital Object Framework Documentation
From the FDO – Kernel Attributes & Metadata Version 2.0 we can extract the required items as denoted by a required cardinality of 1 or more.
Above image is a draft of aligning FDO required properties to schema.org types a properties.
Required Properties: PID, KernelInformation Profile, digitalObjectType, digitalObjectLocation, digitalObjectPolicy, Etag, dateCreated
Integration: CODATA, DeCODER & Ocean InfoHub
CODATA (Committee on Data of the International Science Council (ISC))
Cross Domain Interoperability Framework (CDIF)
Information Exchange at the Application Level (only part of the CDIF approach)
Alignment can also be aided by PIDs (DIDs) or content based addressing (SHAs).
More for KG alignment than profile alignment, perhaps.
Possible pattern to support integration approaches leveraging SHACL (validation) and JSON-LD Frames (alignment) for data graphs.
Thanks