1 of 33

GORC International Model WG

P19 Working Group Session

Karen Payne, ISC World Data System

Mark Leggott, Digital Research Alliance of Canada�Andrew Treloar, Australian Research Data Commons

June 21-22, 2022

A WG within the GORC IG

gorc_model@rda-groups.org

2 of 33

GORC International Model WG

Working Group Session

Karen Payne, ISC World Data System

Mark Leggott, Research Data Canada�Andrew Treloar, Australian Research Data Commons

TBD

A WG within the GORC IG

gorc_model@rda-groups.org

3 of 33

Agenda

  1. Welcome (10 mins)
  2. Introduction to meeting and previous work (10 mins)
    • Take away messages from P18
    • Review and update from the IG: Socialization/explanation of new diagram and associated definitions
  3. Coordination with related BoFs (15 min)
  4. Speaker series recap and live document development
    • EOSC (15 minutes)
    • ARDC (15 minutes)
  5. Interactive discussion (15 min)
  6. Next steps (10 min)

WG Rolling Notes

https://bit.ly/3xljZ03

4 of 33

Introduction to meeting and previous work

Takeaway messages from P18

  1. We will use eInfra, geographic region, domain and GORC typology component as a beginning set of tags for Zotero resources (to be expanded as needed)
  2. We will recruit speakers from inside and outside the WG to describe the initial set of Commons in a series of WG meetings
  3. Speaker series: IVOA; NII Research Data Cloud; EOSC; ARDC
  4. Supply speakers before the presentation with a common list of questions and a checklist of eInfra categories

5 of 33

IG update: Elements of a Commons

6 of 33

IG update: Definitions

The definitions of the terms in the diagram are still a collaborative work in progress

Current document, with edits and comments, is available here

Please add your comments if anything is unclear

We aim to finalise the definitions within the next 2-3 months

7 of 33

Coordination with other BoFs (work in progress)

  • High level aspirational goals
  • Map high level principles to best practices and implementation
  • Community building
  • Expansion from EU context
  • Taxonomy/definitions of data spaces
  • Non-national level infrastructures

Data Commons Principles and implementation BOF

BoF Research Data Spaces Taxonomy

8 of 33

European Open Science Cloud

9 of 33

European Open Science Cloud – Context

  1. EOSC data space is one of the 13 different sector data spaces in the broader EU data strategy (health, mobility, agriculture, tourism, manufacturing etc)

  • Ecosystem approach for EOSC, with 4 main strands
    1. Open Science Policies and Incentives
    2. FAIR data management
    3. Federated access to data and services
    4. Engagement with communities
      1. The big challenge for Europe is that it is very wide and has many stakeholders

10 of 33

European Open Science Cloud – Context

  • The goal of EOSC is to bring together under one umbrella, existing federations that support research (federation of federations)
    • GÉANT, the pan-european federated network
    • The data management portfolio of EUROdat
    • EGI for cloud compute
    • PRACE for supercomputing, HPC
    • Research outputs managed by OpenAIR.
  • Next level down includes national and institutional structures into the portfolio
  • and then specific research infrastructures, thematic e.g., telescopes, Large Hadron Collider (LHC)
    • These are much more tailored to specific needs, which can make interoperability more challenging.

11 of 33

European Open Science Cloud – Core Services

  • EOSC is mediating access to existing infrastructure- this may make them distinct from other commons/infrastructure
    • focusing on research materials produced during research lifecycle (as well as educational material)
    • Creating minimum quality framework for each type of digital research object
    • Federating services across regions/domains is challenging. Addressing in 2 ways:
      • Public procurement of EOSC core - both a common platform + coordinating services that allow enforcement of various policies and the EOSC Interoperability framework (implementation of the marketplace coordinating services is the policy enforcement mechanism when onboarding services)
      • Community developed Framework & interoperability documents and policies (AAI, PID, metadata, security, Procurement, Metadata, Data, etc)

12 of 33

European Open Science Cloud – Core Services

  • The vision for the EOSC catalogue is to move beyond just a list of resources to a set of ‘plug and play’ components
  • EOSC has the hope that the service catalogue can grow to provide “service packages” that results in an open research workflow
  • Eventually there will be some sort of interface that allows researchers to choose, create, execute workflows; there isn’t a way, yet, to assemble the tools in catalogue into an operable workflow.
    • VREs and Gateways could help achieve this vision
    • The procurement of the EOSC Core is the primary focus towards this

13 of 33

European Open Science Cloud – Priorities

  • The EOSC Strategic Research and Innovation Agenda
    • Identifies 3 goals (challenges for people, data and infrastructure)
    • Refined into 14 action areas (7 technical implementation challenges and 7 are boundary conditions)
  • Lists community identified priority areas (roadmap) for each of the next 3 phases of EOSC (through 2030)
  • A set of success metrics for EOSC as a whole (not for individual services)
  • EOSC association has developed 13 task forces to progress these areas
  • RDA can help create consensus on avenues under consideration; need global solution

14 of 33

Australian Research Data Commons

15 of 33

ARDC – Key takeaways: Context

  • ARDC roadmap under revision responding to NRI Roadmap
  • Purpose: “to provide Australian researchers with competitive advantage through data;” Mission: “to accelerate research and innovation by driving excellence in the creation, analysis and retention of high-quality data assets”
    • Highest user engagement research, and weakest in industry.
  • Since 2019, built on consolidation of 3 entities that date back to 2009:
    • Australian National Data Service (ANDS)
    • Nectar (National eResearch Collaboration Tools and Resources) and
    • Research Data Services (RDS)

16 of 33

ARDC – Key takeaways: Offerings

  • Service catalogue - https://ardc.edu.au/services/
    • Role of ARDC: connect and fill gaps
      • Aggregate data/information from different providers; provide pointers to other services
      • See what is being offered and ask “is there value in providing a single central service”
      • Fill gaps in existing service offerings
    • 7 high level categories (at the time of this review)
      • Nectar Research Cloud
      • Research Data Australia
      • Identifier Services
      • Research Vocabularies Australia
      • Research Digital Platforms
      • Community of Practice (CoPs)
      • Engagement and Support

17 of 33

ARDC – Key takeaways: Components

  • A single national commons, with thematic (sub)commons
    • Humanities And Social Sciences (HASS) (initial version being built),
    • Health and Medical (being commissioned) and
    • Ecology/Agriculture (being planned)
      • Q: what is the relationship between these thematic commons and the Translational Research Data Challenges? Are they intended to showcase the functionality of VREs?
      • “more focussed RDCs will be designed to connect into a national RDC and relevant international coordination activities” The national RDC is intended as the mechanism for international coordination? Or the thematic commons is the international engagement mechanism? Or both? Q: How will the national and thematic commons manage international engagements?
      • Q: Will the thematic commons be listed in the service catalogue? As a separate service type or as part of communities of practice or platforms?

18 of 33

ARDC – Key takeaways: Interface and Interoperability

  • Active investment in 26 VREs/Platforms
    • A platform is “a set of online services, often with associated integration and/or orchestration functions and connections to specific data resources, that enable researchers to collect or generate data, analyse those data, and produce outputs that can be made Findable, Accessible, Interoperable, and Reusable (FAIR).’
    • Integrations between services: a number of the VREs use services that ARDC provides (eg. vocabulary, identifier services, storage, and compute services)
    • The Platforms program is “abstracting services”
    • Q: are these platforms designed to have re-usable workflows? Are they using the same development framework? For both the thematic commons and VREs: any common frameworks, development environments, commonalities that would ease their reusability and interoperability with other commons?

19 of 33

ARDC – Key takeaways: Interface and Interoperability

  • Standards supported by domains
    • Q: How do you interface with the domain communities? Are there standing committees that help identify domain accepted standards? Is that done either at the project level via VREs or as part of the new thematic Commons organizing bodies?
  • National data discovery service ISO2164 / RIFS-CS (with OAI-PMH)
    • Increasingly using SDO for harvesting
  • ORCID for people
  • ARDC runs the national CiteData DOI service
    • Q: Any standards, guidelines or common practices associated with vocabulary, software or compute services?
    • Is there a standing body that makes decisions about implementation standards or best practices?

20 of 33

ARDC – Key takeaways: Priorities and Integrations

  • Organizational KPIs under review - will be shared
  • Research Data Australia harvests metadata from other institutions to create their searchable catalogue
    • Q: Are you saying you harvest data from other institutions outside of Australia? In addition to the MOU w/ KISTI (Korea) where you are exploring data catalogue exchange
  • Engagement trough RDA, CODATA GOSC, Research Software Alliance
  • Promotes use of CoreTrustSeal for repositories; security remains the responsibility of the partner data providers

21 of 33

ARDC – Key takeaways: Governance

  • ARDC is company limited by guarantee - board provides oversight
  • No formal rules of participation
  • Governance arrangements vary, many different partnerships, many ad hoc
  • Seat on boards when they are a funder; participation in national coordination groups
  • Institutions required to sign agreements for Commons use (not individuals)
  • Commitment to work at national policy level; advocating is important part of their role
  • Organizational norms that incentivize participation are emergent
    • Q: When you say “ARDC is small relative to other digital infrastructure investments in Australia” what are the larger investments/stakeholders?

22 of 33

ARDC – Key takeaways: eInfra checklist

  • Q: Does ARDC publish an open access journal? Or does this refer to publishing other artefacts?
  • Q: It seems as if the eInfra categories are not a bad mechanism for describing ARDC services, would you agree?

23 of 33

ARDC Components

ARDC Progenitors

ANDS

Nectar

Research Data Services

ARDC Thematic Commons

HASS

Health and Medical

Ecology/agriculture

ARDC Interfaces

API

Web

26 VREs/Platform

ARDC Services

7 high level categories

ARDC Strategic plan categories:

Coordination and coherence

People and policy

Data and services

Software and platforms

Storage and compute

24 of 33

Pan National Commons

  1. European Open Science Cloud
  2. African Open Science Platform [tentative]
  3. Nordic e-Infrastructure Collaboration (NeIC)

Domain Commons

  1. International Virtual Observatory Alliance (IVOA)
  2. Canadian Consortium for Artic Data Interoperability (CCADI)

RED check indicates agreed to be part of the speaker series

Confirmed Commons Participants

Non European Commons

  1. China Science and Technology Cloud (CSTCloud)
  2. Australian Research Data Commons
  3. The Alliance (Canada)
  4. NII Research Data Cloud (Japan)
  5. KISTI (South Korea)
  6. Malaysian OSP

25 of 33

Pan National Commons

Domain Commons

Potential Commons Participants

National Commons

  1. German National Research Data Infrastructure (NFDI)
  2. UK JISC Open Research Framework

Non-European

  • Other US Commons initiatives?
  • BR-CRIS (Brazil)
  • US Cancer Research Data Commons

Commercial

  • GAIA-X
  • MS Planetary Computer

26 of 33

Open discussion and live document development

Martyr document

suggestions

27 of 33

Potential Marty Doc Development: Functions

Observed Functionality

Simultaneously with the speaker series, should we try to identify the types of functions provided by each of the Commons, for example a federated metadata search for all national holdings, types of PID services, a service that creates a graph database showing the relationships between different research outputs, an open access journal, etc.

28 of 33

Potential Marty Doc Development: Orgs/Partners

Commons

International consortia

National/Regional/Domain consortia

IVOA

Is it fair to say IVOA is a domain consortia?

EOSC

Should all of the NRENs be listed as part of eduGAIN?

GÉANT (Full NREN list)

EUROdat - data management

EGI - cloud compute

PRACE - supercomputing, HPC

OpenAIR - Research outputs .

NII

COAR, DataCite, OpenAIRE, crossref and rioxx, ORCID

SINET - Japanese NREN

ARDC

AARNet - Australian NREN

29 of 33

EOSC has the hope that the service catalogue (the e-infra catalogue) can grow to provide “service packages” that results in an open research workflow

  1. Eventually there will be some sort of interface that allows researchers to search the catalogue, then choose services and compose, create and execute workflows; there isn’t a way, yet, to assemble the tools in catalogue into an operable workflow.
  2. They see VREs and Gateways as one area that this functionality can be developed

Observations: Reproducible workflows

NII is developing machine actionable DMPs (same end goal?

  • They are developing vocabularies that extend the RDA DMP Common Standard for MA DMPs
  • NII is building a MA DMP (using in part Datalad)
  • The goal is to have the DMP orchestrate the NII services - it will not only be a place to write the DMP (and submit to funding agencies) but will also deploy the appropriate research environment to execute the DMP
  • If the DMP is deployed in a Juypter notebook will it include the ability to define analytic workflow
  • the NII Cloud can already bundle and publish data and software packages

30 of 33

Next

Steps

June 23, 2022

RDA P19

July 28, 2022

KISTI

August 25, 2022

CCADI

September 22, 2022

The Alliance

October 27, 2022

WG Meeting - Recaps and Doc development

November 24, 2022

NeiC

31 of 33

Create Supporting Outputs for the GORC IG

32 of 33

GORC v. GOSC

  • Looking at how the commons orgs are organized
  • What they provide – scope and depth
  • Engagement with Commons leadership
  • Broad and deep VRE review by domain – landscape
  • Review of multiple platforms, standards, orgs, esp data flows
  • 3 high level WGs looking at issues like policy, interop
  • Very specific case studies – specific platforms, standards and workflows
  • Chinese centric, Almost no US engagement; better resourced
  • “Lightweight policy review” of EOSC Rules of Participation carried out by Sarah Jones”
    • How does that tie in with the GORC IG outputs?

I think both are creating a roadmap?

33 of 33