1 of 27

GORC International Model WG

Working Group Session

Karen Payne, ISC World Data System

Mark Leggott, Digital Research Alliance of Canada�Andrew Treloar, Australian Research Data Commons

April 28, 13:00 UTC

A WG within the GORC IG

gorc_model@rda-groups.org

2 of 27

GORC International Model WG

Working Group Session

Karen Payne, ISC World Data System

Mark Leggott, Research Data Canada�Andrew Treloar, Australian Research Data Commons

March 24, 13:00 UTC

A WG within the GORC IG

gorc_model@rda-groups.org

3 of 27

Agenda Items

  1. Recap NII Research Data Cloud
    1. Thank you Kazu!
  2. Next steps

WG Rolling Notes

https://bit.ly/3xljZ03

4 of 27

NII Summary: Sources

  1. Link to NII questionnaire
    1. https://bit.ly/3MCbJ3k
  2. Meeting
  3. WG Rolling Notes

5 of 27

NII Research Data Cloud – Summary

  1. Context
    1. National Information Infrastructure (NII) is part of Japanese Society 5.0 initiative, a whole of government aspiration
    2. “A human-centered society that balances economic advancement with the resolution of social problems by a system that highly integrates cyberspace and physical space
    3. It follows from: the hunting society (Society 1.0), agricultural society (Society 2.0), industrial society (Society 3.0), and information society (Society 4.0).
    4. Society 5.0 cuts through overwhelming burden of overflowing information in a way that supports core values of mutual respect for each other and in which each and every person can lead an active and enjoyable life

6 of 27

NII Research Data Cloud – Summary

  1. NII is an NREN (National Research Education Network)
    1. On top of the network providing upper-layer services
      1. AAI/several identity federations
      2. Cloud service
      3. Security service
      4. Top-layer service: the NII Research Data Cloud (top priority for resources)

7 of 27

NII Research Data Cloud – Summary

  1. 3 interconnected platforms (Started 2017 - Operational since 2021):
    1. Micro-service architecture within each platform
    2. Each new service has an api and supports a portion of the research lifecycle
    3. Built on top of a fork of the (US) Open Science Framework (https://osf.io/) (?)
    4. 4th educational platform in development (?)

8 of 27

NII Research Data Cloud – Summary

  • Publications -WEKO3; Disclosure/accumulation/self archive. SaaS.
    • Primarily designed for journal articles, can also publish bundled packages of data and software emanating from the other 2 platforms.
    • Open to all domains, they can utilize their own domain specific metadata schema
      1. WEKO harmonizes the metadata (?)
      2. Publish metadata of research outputs under OAI-PMH
    • Users can also construct their own domain specific repository using the platform
    • Aggregates publications from institutional repositories across Japan (part of the international Confederation of Open Access Repositories (COAR) community)

9 of 27

NII Research Data Cloud – Summary

  • Discovery - CiNii; Search / discover / use artefacts: datasets and publications (articles/books/dissertations) researchers who produced those results, and information on research projects. Central Service
    • No catalogue of services, but NII RDC does provide a data catalogue via oai-pmh, as part of CiNii
    • Associated PID service
    • Aggregates data from institutional repositories and JP research communities
      • 2 back end DBs:
        1. International data (?)
        2. Subject repository
    • Information on projects derived from an aggregator Kaken (?)

10 of 27

NII Research Data Cloud – Summary

  • RDM - GakuNin - intended for researchers, primary development focus; manage research data and relevant files during a research project. Central Service. Services:
    • Closed setting, controlled file-sharing access among project members
    • Files versioning
    • Hot and cold storage
    • Long-term preservation
    • Time-stamping/proof of data existence
    • Metadata updates
    • Currently working on DMP tools
      • Submission to funding agency,
      • And DMP for researchers’ own data management.
      • Looking at RDA machine actionable DMP common standards
        • Developing enhancements to the vocabulary of the DMP to fit entire research data cloud

11 of 27

NII Research Data Cloud – Summary

  • NII adds 7 functions on top of the 3 interconnected platforms
    • Data Governance
    • Data Provenance
    • Code Package
    • Secure Computation
    • Data Curation
    • Secure Storage
    • Capacity Building

12 of 27

NII Research Data Cloud – Summary

  • Connections between platforms
    • Discovery and Publications (RDM - GakuNin and WEKO)
      • Both public facing and work together-focuses on research papers (less on research data)
      • Link data to article
      • PIDs: ORCID, DOI, ISNI (and research project?)
    • Discovery, RDM and Publications (all 3)
      • The WEKO3 publication repository is primarily designed for journal articles but can also publish bundled packages of data and software emanating from the other 2 platforms (they can also be published to github).

13 of 27

NII Research Data Cloud – Summary

14 of 27

NII Research Data Cloud – Summary

  • Bottom-up and Top-down approaches �→ relationship between NII and researchers is mediated by university libraries and IT departments

→ maintaining good relations with Cabinet Office and Education Ministry

  • Trust is key!�→ being always receptive & responsive to the university (library/IT) & community needs gives NII a “competitive” advantage over other market products
  • Business model & sustainability�→ Options: bilateral exchange with other countries vs. Open Infrastructures

Moving from 3 platforms to RDM system (data governance, data provenance, code package, secure computation, data curation, capacity building, secure storage)

15 of 27

NII Research Data Cloud – Summary

Governance: NII adheres to the Committee on Open Science at the Ministry of Education and Research

16 of 27

NII Research Data Cloud – Summary

  • A lot of success providing services to 700 institutions across Japan; actively seeking international connections to other Commons (both domain and general purpose)
  • Discovery and publication are public facing
    • The difficult part is the closed environment, that is GakuNin RDM platform - developing trust to allow access from industry/public

17 of 27

NII Research Data Cloud – Summary

  • Business model & sustainability - Options: bilateral exchange with other countries vs. Open Infrastructures
  • Opening the platform up to international users affects the a sustainable business model
    • Operating Budget from JP government supplemented by fees coming from JP university.
    • If the users are within JP no problem; if people outside of JP use the RDM platform we have to rethink the business model and membership policy.
    • It is possible we could charge a fee to the JP based PI (If there is one); but at this time we do not have a solution that allows NII to open the RDM platform as much as possible

18 of 27

NII Research Data Cloud – Summary

  • Security: ISO 21001:2018 certified;”Educational organizations — Management systems for educational organizations — Requirements with guidance for use” - not a security policy; an auditing of the NII operation.
  • In the case of AAI, NII and members define operational policy, but the operation of IdPs are distributed (recall it is an NREN)
  • NII has a role in keeping the security of the RDM and Discovery Platforms (central services), and the Publication Platform (SaaS).

19 of 27

  • Outstanding questions: Overall Infrastructure / Security

20 of 27

  • Outstanding questions: WEKO
    • WEKO harmonizes the metadata (?) How? via broker? Is there a national standard for publication metadata and for data/software bundles?
    • Researchers can “add modules or add-ons that can only be used for the domain space”
      • Is that true of all 3 platforms or just WEKO?
      • How is that managed, is there a code review for each platform?
      • Is there a marketplace for contributed addons?
      • Is there guidance about writing and submitting add ons, or do they only run locally, outside of the platform, but can receive content from one of the 3 platforms via an API?

21 of 27

  • Outstanding questions: CiNII
    • Would you say that CiNii is similar to OpenAIRE?
    • When you say that CiNii Aggregates data from institutional repositories and JP research communities and one of them is “International data” ((“Data Exchange with International Discovery Service”) what does that mean? Does it mean open to the international community to search? Or the data were published in a repository outside of JP?
    • You can search for information on funded projects within CiNII: Do these reflect a national metadata standard and process to describe projects funded only by the Government of Japan? All Japanese funding sources? All funding sources domestic and international?

22 of 27

  • Outstanding questions: Guka Nin RDM
    • Does the RDM platform provide access to computing power (HPC, VM?) I understand that reusable software and data containers can be created
    • “... NII RDC Computational Services enable a researcher to create a package that contains his/her data processing program (software) together with its execution environment (configuration file) so that it can be easily reused by other researchers or students. The package is stored in GakuNin RDM and shared within a group, or publicated in WEKO3 or GitHub”
    • But does the NII also have compute power available to researchers??

23 of 27

  • Outstanding questions: Future development
    • Are you developing new ontologies for the MA DMP? Do you provide a vocabulary service to support it?
      • Is the machine actionable DMP built on top of the (Australian) Research Data Box ReDBox?
      • Is the intent to develop the MA DMP such that it has the capability to deploy an appropriate research environment across the NII ecosystem? Develop so that it can orchestrate NII services (across all 3 platforms)?
      • Is the MA DMP deployed as a Jupyter Notebook?
    • Are you currently developing a 4th educational platform? Another OSF fork?

24 of 27

  • Outstanding questions: Governance
    • Each of the 3 platforms, are they drawing on existing federations?
      • The publication platform in part has COAR content; other federations?
      • When you say “...we provide our metadata to OpenAIRE, BASE Core, Google and commercial platforms.” are you referring to Google Scholar and Google Dataset Search? Any other commercial entities?
    • Is there a need here for a new global Commons consortium? Or should this be handled via NRENs? Or WGs to develop international governance policies?

25 of 27

Next

Steps

  • Australian Research Data Commons, Andrew Treloar, May 26, 2022
  • P19 June 23, 2022
    • Recap EOSC and ARDC
    • Martyr doc development
    • How do we capture Commons comparisons
  • Sa-kwang Song, KISTI, July 28, 2022

26 of 27

EOSC has the hope that the service catalogue (the e-infra catalogue) can grow to provide “service packages” that results in an open research workflow

  1. Eventually there will be some sort of interface that allows researchers to search the catalogue, then choose services and compose, create and execute workflows; there isn’t a way, yet, to assemble the tools in catalogue into an operable workflow.
  2. They see VREs and Gateways as one area that this functionality can be developed

Observations: Reproducible workflows

(Japan) NII is developing machine actionable DMPs with the same end goal in mind.

  • They are developing vocabularies that extend the RDA DMP Common Standard for MA DMPs
  • Their MA DMP is built on the (Australian) Research Data Box (ReDBox)
    1. ReDbox was designed (?) to support the (UK) DCC Curation cycle
  • The goal is to have the DMP orchestrate the NII services - it will not only be a place to write the DMP (and submit to funding agencies) but will also deploy the appropriate research environment to execute the DMP
  • The DMP - I believe - is deployed in a Juypter notebook.

27 of 27

Previous Session – NII Research Data Cloud

Key Takeaways

  • Bottom-up and Top-down approaches �Combined approach

→ relationship between NII and researchers is mediated by university libraries and IT departments

→ NII maintains good relations with Cabinet Office and Education Ministry of Japan

  • Trust is key!�→ being always receptive & responsive to the university (library/IT) & community needs gives NII a “competitive” edge over other market products
  • Business model & sustainability�→ Remains a key challenge

→ Options: Bilateral access agreements with other countries vs. Global Open Infrastructures

NII Research Data Cloud

Three original platforms (RDM, publication, discovery service) with new services planned and in development (data governance, data provenance, code package, secure computation, data curation, capacity building, secure storage)