1 of 21

Federated Infrastructure Updates and �FAIR Digital Objects

National Science Data Fabric Bi-annual meeting

Kevin Coakley

Christine Kirkpatrick

This work funded by the National Science Foundation, award 2138811

National Science Data Fabric

2 of 21

Overview

  • Use Case Update: OSG & Xenon interaction
  • CI Partner Update: Open Storage Network (OSN)
  • Opportunities for NSDF
  • FAIR Landscape Developments
    • FAIR Digital Objects
  • Future Work for NSDF

National Science Data Fabric

3 of 21

Use Case Update:

Open Science Grid & XENON

National Science Data Fabric

4 of 21

NSDF Integration with OSG

National Science Data Fabric

5 of 21

NSDF Integration with XENON

National Science Data Fabric

6 of 21

CI Partner Update:

Open Storage Network (OSN)

National Science Data Fabric

7 of 21

New OSN Sites

Original Sites

  • JHU
  • NCSA
  • Northwestern
  • MGHPCC
  • RENCI
  • SDSC

New Sites

  • AAMU
  • U Maine
  • URI
  • NY American Museum (2 pods)

National Science Data Fabric

8 of 21

New “mini-pod” Design

Original pod

Mini-pod

Storage servers

5

2 + 1 JBOD

Other servers

3

1

Physical Size

25U

7U

Cost

~$150k

~$100k

Raw Size

~1.5 PB

~2 PB

Server Redundancy

1 server

0 servers

National Science Data Fabric

9 of 21

OSN Bucket API

  • New API exposes all publically available buckets across all of the OSN Pods.
  • The API includes endpoints for:
    • buckets
    • bucket stats
    • objects
    • object stats
  • Future updates will include additional metadata about the datasets
  • API could be consumed by the NSDF Catalog to allow access to the OSN data to the NSDF community

National Science Data Fabric

10 of 21

OSN Publicly Available Datasets

National Science Data Fabric

11 of 21

Takeaways and Opportunities for NSDF

Lessons Learned from Xenon use case:

  • We can add value, but very little tolerance for time to transition unless it’s a project pain point
  • OSG now integrated with other components
  • Introduced Qumulo storage option for easy mounting from HPC
  • Partnership dependent on individuals (our contact left)

Opportunities from OSN partnership:

  • Look to newly funded pods as additional use cases, data to catalogue

National Science Data Fabric

12 of 21

Data Landscape Update:

FAIR Digital Objects (FDO)

National Science Data Fabric

13 of 21

What is a FAIR Digital Object?

  • a chunk of any digital content
    • data, metadata, schemas, software, workflow, model
  • object-oriented data for next iteration of the internet �(name TBD: the datanet)

...01100011011100111000100101001010010111100110100...

Source: some slide content from FDO Forum: fairdo.org

National Science Data Fabric

14 of 21

Making FDOs Useful

We need to know:�

  • What is it?
  • How was it created?
  • Where is it stored?
  • How persistent is it?
  • Who is allowed to access it?
  • How to reuse it?
  • How to interpret it?
  • With which operation can we process it?

  • data scientists
  • data stewards
  • data managers
  • institution managers
  • system operators

slow, time consuming, inconvenient, erroneous

not usable

by

machines

Currently to find out:

National Science Data Fabric

15 of 21

Digital Object to Bundle Useful Info

  • A global, unique, resolvable, and persistent identifier (PID) is associated with a set of agreed kernel attributes.
  • The PID system resolves a PID to attribute-value pairs.
  • If the PIDs are persistent also the associated values are persistent.

National Science Data Fabric

16 of 21

From DO to FDO

  • Digital Objects are used in many applications (object stores, publications, film fragments, large scientific collaborations, etc.)
  • No harmonisation and not machine actionable
  • Machine actionable DO → FAIR Digital Objects

National Science Data Fabric

17 of 21

So What?

Retrofit existing data

Package data with metadata

Fewer copies of data needed

Enable data federation

Globally identify misinformation

National Science Data Fabric

18 of 21

Short FDO History

“Digital objects” 1990s and prior related to DOIs

Kahn, R., & Wilensky, R. (2006). A framework for distributed digital object services. International Journal on Digital Libraries https://doi.org/10.1007/s00799-005-0128-x

Emerged over last 5-6 years to include DOIP, handle & type registries

‘Paris meeting’ 2019 organized by GEDE

    • Introduction of Luiz Bonino’s FAIR DO framework

Formation of FDO Forum, January 2021

National Science Data Fabric

19 of 21

FDO Forum

Executive Board & Steering Committee

Working groups

http://fairdo.org

National Science Data Fabric

20 of 21

Is there a ‘there there’?

1. Specification Documents

  • FDO Requirement Specifications (v2)
  • FDO Machine Actionability (v2.1)
  • Typing FAIR Digital Objects (v2.1)
  • FDO PID Profiles & Attributes (v2.1)
  • FDO - Granularity, Versioning, Mutability (v2.1)
  • FDO Configuration Types (v2.1)

2. MOU and partnership with DIN �(German ISO)

3. Working example in EU research infrastructure project DISSCo: �Distributed Systems of �Scientific Collections

National Science Data Fabric

21 of 21

Future Work for NSDF

  • Match up with use cases that reuse data with insufficient metadata
  • Be the first US research infrastructure to be FDO-ready
    • Provide type and (PID) registry services
    • Consider awareness, intro training by partnering with
      • GO FAIR US
      • Research Data Alliance FAIR Digital Object Fabric IG
      • FDO Forum
  • Contribute to working group and (test) specifications

National Science Data Fabric