1 of 27

Powering open science and collaboration with Invenio

Northwestern University Invenio Team

03 March 2020

@inveniosoftware

2 of 27

OHDSI: open, collaborative science

BENEFICENCE

We seek to protect the rights of individuals and organizations within our community at all times.

COLLABORATION

We work collectively to prioritize and address the real world needs of our community’s participants.

COMMUNITY

Everyone is welcome to actively participate in OHDSI, whether you are a patient, a health professional, a researcher, or someone who simply believes in our cause.

OPENNESS

We strive to make all our community’s proceeds open and publicly accessible, including the methods, tools and the evidence that we generate.

REPRODUCIBILITY

Accurate, reproducible, and well-calibrated evidence is necessary for health improvement.

INNOVATION

Observational research is a field which will benefit greatly from disruptive thinking. We actively seek and encourage fresh methodological approaches in our work.

VALUES

MISSION: To improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care.

3 of 27

4 of 27

Benefits of opening science...

5 of 27

Invenio software powers open science

6 of 27

How did this collaboration start (and what about Zenodo?!)

What motivated the InvenioRDM project?

  • Some organizations tried to reuse the existing open source Zenodo source code
  • Other orgs tried to use the Invenio Framework to build a RDM repository from scratch
  • Several orgs tried to make the same modifications but had no easy way of sharing their changes

All these groups came together to create a collaborative open source project and grow a sustainable community.

Zenodo will also run on InvenioRDM by the end of the project period.

7 of 27

  • Research, shared. Securely share and preserve data records and a wide range of research types with collaborators. Allows easy dissemination to the community.
  • Discoverable. Leverages metadata standards and the powerful Elasticsearch full-text search engine retrieves, facets, sorts, and filters your searches with ease.
  • Scalable. Invenio is fast. Designed to manage 100+ million records and petabytes of files. All data can be archived independently of the size.
  • Communities. Create and curate your own community (e.g., workshop, project, lab, or journal).
  • A robust community: Large team of developers & active open source community. A SAAS-model for service via TIND (CERN spinoff). Invenio is widely used by many organizations & underlying technology (Python, Flask) widely supported.
  • Next-Generation: With InvenioRDM, any organization can launch a turn-key open source next-generation repository platform with world-class features to support open and FAIR science. http://ngr.coar-repositories.org/
  • Get credit & be cited. Get a DOI to make records easily and uniquely citable. Pre-formatted citation text makes it easy to cite your work and be cited. Contributor roles allow you to recognize the whole team.
  • Metrics. Industry standard usage statistics for record pages with all tracking completely anonymized.
  • FAIR. Advanced features to make your research Findable, Accessible, Interoperable, & Reusable.
  • Compliance-friendly. Comply with data sharing mandates* and acknowledge your funders.
  • Easy. Turn-key research data management platform & index can be easily deployed in the local environment by your team or by a service provider, such as TIND. Customize the look and feel to your local environment.

RDM platforms

are critical to help preserve and share research, enable reproducibility, and empower reuse of datasets, protocols, engagement or study materials, & a wide range of other research products.

We’re leveraging Invenio as a strong foundation. Here’s why.

8 of 27

The InvenioRDM project has two goals:

9 of 27

The platform

A few highlights...

10 of 27

InvenioRDM stack

Invenio is JSON-native and provides RESTful APIs to make it easy to build apps on top of the framework

11 of 27

InvenioRDM roadmap

February

    • Milestones: Draft governance and sustainability plan, mock-up feedback from collaborators
    • Release
      • Branding customization - institutional theming can be applied
      • First iteration of the Data model
      • Improved CLI with improved workflow
      • New documentation site for developers (http://inveniordm.docs.cern.ch)
      • Closer project tracking with enhanced structure and outreach

March

    • Milestones: First release for core plugins, review of February release
    • Release
      • Search permissions
      • Deposit page
      • Improved record page
      • Data model update
  • To see further ahead: https://invenio-software.org/products/rdm/roadmap/

12 of 27

Standing up InvenioRDM

1- Install invenio-cli

pip install invenio-cli

2- Initialize your project

invenio-cli init --flavour=RDM

3- Run it

cd <project name>

invenio-cli containerize

4- Visit https://localhost

firefox https://localhost

13 of 27

System requirements

Invenio can run in Docker, on virtual machines, or on physical machines. Invenio can run on a single machine or a cluster of 100s of machines.

It all depends on exactly how much data you are handling and your performance requirements.

Small installation:

  • Web/app/background servers and Redis: 1 node
  • Database: 1 node
  • Elasticsearch: 1 node

Medium installation:

  • Load balancer: 1 node
  • Web/app servers and background workers: 2 nodes
  • Database: 1 node
  • Elasticsearch: 3 nodes
  • Redis/RabbitMQ: 1 node

Large installation:

  • Load balancer: 2 node (with DNS load balancing)
  • Web/app servers: 3+ nodes
  • Background workers: 3+ nodes
  • Database: 2 nodes (master/slave)
  • Elasticsearch: 5 nodes (3 data, 2 clients)
  • Redis: 3 nodes (HA setup)
  • RabbitMQ: 2 nodes (HA setup)

14 of 27

Search and retrieve datasets using standards-based documentation

Robust search enhanced by:

  • Standardized forms of name (LDAP + ORCiD coming soon)
  • Standard subject terms (MeSH, Library of Congress Subject terms)
  • Standardized citation formats
  • Clear levels of access
  • Standard application of licenses

15 of 27

Data management for reproducibility and

Open Access: study-focused resource types�

InvenioRDM helps you store, manage and, if needed, share your study’s outputs:

  • Study-based resource types to manage a large range of assets
  • Reproducibility is enhanced: store research proposals, datasets, code
  • Be compliant with data sharing mandates
  • Cite and attribute the work of all contributors to research
  • Reuse deposited data or measures from other studies

16 of 27

Communities & Collections

Phenotype Definitions

XYZ Clinical Study

Community: Define your research group or other collaborative unit

Collection: Create multiple Collections under the umbrella of the Community. Within Collections, deposit and describe your:

Phenotype Definitions

Definitions

Characterizations

Evaluations

Metadata

Dissemination Strategy

Clinical Studies

Research Proposals

Protocols

Data Management Plans

Methods Descriptions

Measures

Case Reports

Datasets and Analyses

Collections bring together related groupings of documentation to communicate process, enable sharing of results, and support publication, compliance, and reproducibility

17 of 27

Collections & Clinical Studies

Store multiple datasets with large numbers of detailed results from each analysis and re-use of data generated by a single study.

Results presented in InvenioRDM are:

  • easy to find
  • browsable
  • publicly available
  • citeable

Hone in on the results you seek using InvenioRDM’s robust metadata of subject and resource type terms.

18 of 27

InvenioRDM incorporates contributor roles for all records. Deposit your SQL code, statistical analysis plan, database code, and other study documentation; receive credit, and group all documents in a Collection

Biostatistician

Developer

Data Analyst

Gupta, Simran

Properly attribute all contributors to research

Gonzales, S., O’Keefe, L., Gutzman, K., Viger, G., Wescott, A., Farrow, B., . . . Holmes, K. (n.d.). Personas for the Translational Workforce. Journal of Clinical and Translational Science, 1-27. doi:10.1017/cts.2020.2

19 of 27

Collaborators: Work with them and discover new ones

InvenioRDM will allow private record sharing, so researchers can:

  • Share files with each other, but not anyone else in the university community or the public
  • Vet materials collaboratively and privately before switching records to ‘public’ for open access/data sharing

User 1’s files

User 2’s files

InvenioRDM will have a social component, allowing researchers to:

  • Follow other researchers
  • Receive updates when someone they follow deposits something
  • Manage requests to access files represented by a metadata-only record

20 of 27

The community

21 of 27

https://inveniosoftware.org/ and click on “RDM”

22 of 27

InvenioRDM collaborators

23 of 27

How can Invenio support the OHDSI community?

24 of 27

We’re managing a large multi-site project, harmonizing data from numerous sources and managing research projects. We want to create communities of practice to integrate theories, data, techniques, and tools.

I lead a large basic science research group. We use InvenioRDM to support reproducible science by packaging combined with big data mining, a desire to process collected data using the latest bioinformatics tools.

I am a clinical researcher. I need a way to pre-register protocols or research proposals, search on demographics of participants in similar studies, get insights into recruitment, share portions of study for compliance.

Our multi-institution health equity project uses InvenioRDM to collaborate with our community- based partners and credit these partnerships. We can share materials from community health events, project materials, training materials, annual reports, and lay summaries of research. InvenioRDM helps us to be better partners, accountable to collaborators and the community,

I’m an early career researcher just getting started on my research career. I need to “put my best foot forward” to showcase my work and demonstrate my expertise and collaborations. Invenio gives me a way to make all of my research efforts findable and the metrics are helpful for reporting. and highlighting my impact to my leadership.

My team wants to find out about clinical trial opportunities to offer patients all options for treatment. It is important to us to openly share the latest research with patients. InvenioRDM communities give us a way to make these materials openly available and packaged in a cohesive and attractive manner. As resources are updated, we can upload the new versions and track access.

Some Use Cases

Our institute wants a way to publish and disseminate content such as our handbook, lay summaries, and more. We want to credit all contributors and produce an attractive and interactive resource that can be easily updated.

25 of 27

FAIR: OHDSI & InvenioRDM

InvenioRDM’s records are made findable through each being issued a Digital Object Identifier (DOI), and through their metadata being indexed and made searchable immediately.

OMOP database summaries can be published in InvenioRDM as findable descriptor records to reference the database for reproducibility and citation

Metadata in InvenioRDM are accessible because they are retrievable using a standardized communications protocol which is free and universally implementable.

OMOP data can be mapped through similar open protocols through SQL interfaces, though largely for secure querying. Results of analyses in multiple OMOP databases can be cataloged in InvenioRDM, and these records retrieved through the open protocol OAI-PMH.

InvenioRDM leverages metadata encoding (JSON) and vocabulary (FundRef, OpenAIRE, COAR Resource Types, etc.) standards to ensure maximum interoperability for records describing digital assets.

OMOP similarly ensures interoperability through its CDM and standardized vocabulary, and the OHDSI community goes beyond this work by providing a platform to enable an interoperable understanding of the analysis methods for healthcare data.

Ensuring the reusability of digital assets deposited in InvenioRDM is key and is achieved through assigning licenses and establishing provenance through registering users.

OHDSI’s Metadata Working Group is actively working toward attaching provenance information to OMOP records.

26 of 27

Links

Northwestern's Proof of Concept: http://bit.ly/inveniordm-at-nu

    • Test Login: gla3975
    • Password: InvenioRDM@NU_2019

Install your own instance! https://inveniordm.docs.cern.ch/

27 of 27

With thanks…

Teams

  • The Invenio team @ CERN & RDM collaborators (here)
  • Galter Health Sciences Library & Learning Center
  • Northwestern University Clinical and Translational Sciences Institute (NUCATS)
  • CTSA Program Center for Data to Health (CD2H) team
  • The NU Institute for Innovations in Developmental Sciences
  • Confederation of OA Repositories (COAR)

Support

Work presented here is supported in part by:

  • CERN Knowledge Transfer Fund
  • CD2H: U24TR002306 (NCATS)

NUCATS: UL1TR001422 (NCATS)

all of the InvenioRDM project partners

Guillaume Viger

Sara Gonzales

Lisa O’Keefe

Matt Carson