2 of 10

Topics

References:

Theme:

Leverage web architecture, commodity tooling, and semantics as the platform for publishing and consuming metadata/data. #useTheWeb

Keywords:

3 of 10

Structured Data

Metadata serialized in JSON(-LD)

Context

Vocabularies: Schema.org DCAT, GeoSPARQL, PROV, Units and measurements

Representation

JSON-LD (RDF with tooling)

Profile Communities provide guidance on implementation

Science on Schema (geo)
BioSchema (bio)
Darwin Core
DCAT Profiles
etc.

Benefits

Common tooling and skill sets
RDF data model with multiple representations
Data servers with multiple functional goals

Read or Write optimize
Text
Spatial
Temporal
etc

4 of 10

Data on the web (the web is your architecture)

Communication is via web architecture with different proposed levels (work in progress with CODATA).

L1: Structured data on the web with JSON-LD encoded RDF in script tags accessed by HTTP. One option here is the use of content negotiation to obtain the JSON-LD directly without the need for the consumer to parse it. Included in this are robots.txt and sitemap.xml utilization for resource location.
L2: Signpost style link headers in the HTML (or HTTP). This would allow the addition links with relations based on IANA Link headers for the list. Valid terms of interest might be: described by, describes, profile,
L3: Content Negotiation by Profile

One of the main reason for this work is to explore the scaling of the web architecture to large collections of resources such as samples.

Note, HTTP 2.0 compressed headers, multiplexing, server push and full duplex communications are also being explored to address scaling and incremental indexing.

5 of 10

Implementations

Socio-Technical:

Implementation is really a scio-technical processes.

The socio part being alignment to policy and procedure in terms of data schema (profiles) and publication (web arch).

Google Dataset Search
NSF GeoCODES (DeCODER)
UNESCO ODIS Ocean InfoHub
POLDER Polar Data Discovery Enhancement Research
Canadian Consortium for Arctic Data Interoperability
Internet of Water
WIFIRE
Helmholtz Germany
Australian Research Data Commons

Some of the communities using this approach

6 of 10

Implementations

GleanerIO (https://github.com/gleanerio) as an implementation if this web architecture based model. It is not the only one and given the web architecture base, it is relatively easy to build workflows to leverage it.

Set of containers (OCI) deployed in a orchestration environment (docker or kubernetes) either local or in the cloud.

7 of 10

Segway to FAIR

We can view these implementation networks in terms of GoFAIR concepts.

The groups can be inspected by personas, let's use these:

Publisher

Indexer

User & Community

So these can be easily seen as FAIR Implementation Neworks and also relate to the FAIR Implementation Profiles

The whole process is a continuous workflow of interaction between the various personas.

8 of 10

FAIR Digital Objects (Framework)

In FDOF, we have an identifier record named FDOF's Identifier Record (FDOF-IR), a specific type of metadata, containing information about: the object's type; the object's metadata record(s); and the object's location(s). More details on this can be found in FAIR Digital Object Framework Documentation

From the FDO – Kernel Attributes & Metadata Version 2.0 we can extract the required items as denoted by a required cardinality of 1 or more.

Above image is a draft of aligning FDO required properties to schema.org types a properties.

Required Properties: PID, KernelInformation Profile, digitalObjectType, digitalObjectLocation, digitalObjectPolicy, Etag, dateCreated

9 of 10

Integration: CODATA, DeCODER & Ocean InfoHub

CODATA (Committee on Data of the International Science Council (ISC))

Cross Domain Interoperability Framework (CDIF)

Information Exchange at the Application Level (only part of the CDIF approach)

Data Discovery and Assessment
Data Access
Data Integration and Reuse

Core elements for integration include:

Space
Time
Units and variables

Alignment can also be aided by PIDs (DIDs) or content based addressing (SHAs).

More for KG alignment than profile alignment, perhaps.

Possible pattern to support integration approaches leveraging SHACL (validation) and JSON-LD Frames (alignment) for data graphs.

1 of 10