1 of 89

Research Data Management

Agenda

Date: Wednesday - 8th of June 2022

Time: 9.00-10.30 CEST

2 of 89

Introduction

Ulrike Wittig (DE)

ELIXIR-DE and ELIXIR Data Platform ExCo

Heidelberg Institute for Theoretical Studies, Heidelberg, Germany

Time: 9.00-9.05

3 of 89

Research Data Management

#ELIXIR22

4 of 89

Data Management Systems as ELIXIR Services

Support researchers

to manage their data and make them FAIR

#ELIXIR22

5 of 89

Supporting FAIR Data in ELIXIR

FAIR Resources & Tools

Registries

Standards, Ontologies & Identifiers

Data Management Platforms

DMP & Stewardship Tools

FAIR Metadata Markup

Trusted Repositories

Core Data Resources

Deposition Databases & Portals

Scalable Curation

Sustainability

#ELIXIR22

6 of 89

Supporting FAIR Data in ELIXIR

Communities & Focus Groups

FAIR Expertise & Training

Capability & Skills

Training Events & Material

Data Management Expert Network

RDM Guidelines

FAIR recipes

#ELIXIR22

7 of 89

Workshop

“ELIXIR FAIR & Research Data Management know-how ecosystem”

Thursday 11:00-12:30

#ELIXIR22

8 of 89

Contributing to EOSC Research Graphs: the role of FAIRsharing and Bioschemas

Susanna-Assunta Sansone (UK)

ELIXIR-UK and ELIXIR Interoperability Platform ExCo;

Professor of Data Readiness, University of Oxford, UK.

In collaboration with:

Allyson Lister, FAIRsharing

Alasdair Gray, Bioschemas

Time: 9.05-9.20

9 of 89

#WeAreEOSC - let’s make sure our data resources are visible!

  • Ensuring that ELIXIR resources and services, and in particular databases, knowledge bases and repositories are more discoverable in EOSC
    • Via the EOSC Research Graph by OpenAIRE

  • This talk is not about what the EOSC Research Graph is in details, what EOSC is for, whom is it for, etc.
    • Join our EOSC-related sessions this afternoon

#ELIXIR22

10 of 89

A 10,000 foot view of the EOSC Research Graph by OpenAIRE

“Open, participatory research graph where products of the research life-cycle (e.g. scientific literature, research data, project, software) are semantically linked to each other and carry information about their access rights (i.e. if they are Open Access, Restricted, Embargoed, or Closed) and the sources from which they have been collected and where they are hosted”

URL: graph.openaire.eu

#ELIXIR22

11 of 89

ELIXIR’s end goals: feed proper info in the graph, and easily

#ELIXIR22

12 of 89

FAIRsharing and Bioschemas as information providers

  • Provides rich descriptions of the databases, incl. content types, access conditions, mantainers (ORCiD), organizations (ROR) etc., also interlinked with standards
  • And we already have descriptions of many/most ELIXIR and EOSC-Life databases and standards!
  • Provides key metadata for discoverability of dataset content
  • Many ELIXIR databases’ pages are already marked up

#ELIXIR22

13 of 89

Prototyping the process, unfunded - your role and our work

Mapping

and

harvesting

We are mapping Bioschemas to the Datacite schema (Enrico Ottonello, Andreas Czerniak, Nick Juty, Alasdair J. G. Gray)

We are mapping FAIRsharing model and databases IDs to the openAIRE model (Ramon Granell, Alessia Bardi, Delphine Dauga, Allyson Lister)

OpenAIRE retrieves general info from FAIRsharing and follows the link to the sitemap where it harvest the Bioschemas mark-up

1. You register (or claim) your database, adding (or vetting) additional descriptors, including:

- mantainers, as individuals and organizations

- publications

- data access conditions

- standards implemented

2. You specify the Bioschemas access points

-

1. You markup your database’s pages, including links to:

  • containing dataset
  • publications
  • equivalent resources in other sites

2. Create a sitemap

We do* the rest of the work for you!

*note: this is an unfunded pilot!

#ELIXIR22

14 of 89

Bioschemas - database records: consolidation of concepts

sameAs

#ELIXIR22

15 of 89

FAIRsharing - database as a whole: full curated descriptions

DOI: 10.25504/FAIRsharing.jwra3e

DOI: 10.25504/FAIRsharing.dt9z89

16 of 89

FAIRsharing - database as a whole

DOI: 10.25504/FAIRsharing.dt9z89

License

Maintainer(s)

Standard(s)

Database(s)

API

Life cycle status

Overview information and status

17 of 89

FAIRsharing - database as a whole

DOI: 10.25504/FAIRsharing.dt9z89

Subject classification

Classification is powered by our

Subject Ontology of 436 terms

URL: fairsharing.org/browse/subject

URL: github.com/FAIRsharing/subject-ontology

18 of 89

FAIRsharing - database as a whole

DOI: 10.25504/FAIRsharing.dt9z89

Organizations and publications

19 of 89

FAIRsharing - database as a whole

DOI: 10.25504/FAIRsharing.dt9z89

Standards implemented and related databases

20 of 89

FAIRsharing - database as a whole

DOI: 10.25504/FAIRsharing.dt9z89

Access points and conditions

21 of 89

FAIRsharing - building the EOSC-Life map of resources

These are profiles of the organizations and their RIs, with their data resources and standards

URL: fairsharing.org/graph/3513

22 of 89

FAIRsharing - signposting, surfacing resources to EOSC

Organization pages in

give an overview of their data resources and standards

A curated EOSC-Life Collection in

of data resources and their standards

OUTPUT

TASK & OWNER

RI and organization pages are automatically created

resource managers

Add or claim, and describe data resources and standards they have developed, and associate them to their organisation(s) and their RI(s)

Descriptions of data resources in

are accessible in the

EOSC ecosystem

Create and maintain mappings of the description of data resources

(cross)links

Prototyping this work in

23 of 89

FAIRsharing - working with international communities to create subject-specific collection of resources, e.g.:

Collection URL: fairsharing.org/graph/3513;

each record has a DOI

Collection URL: fairsharing.org/graph/3515;

each record has a DOI

24 of 89

FAIRsharing - launching the Community Curation Programme!

https://eoscfuture-grants.eu/node/262

https://fairsharing.org/community_curation

Soft launch with the first life science curators:

She is the first ELIXIR awardee of the

Domain Ambassador Programme!

25 of 89

Play your part and help us surfacing proper info to EOSC!

Mapping

and

harvesting

Markup your database’s pages

Register or claim your database

Thank you to the Bioschema, FAIRsharing

and OpenAIRE teams!

We do* the rest of the work for you!

*note: this is an unfunded pilot!

#ELIXIR22

26 of 89

RO-Crate for Research Reproducibility & Data Management - Carole Goble

Carole Goble (UK) with Stian Soiland-Reyes (UK)

Time: 9.20-9.35

Slides: linked here

27 of 89

Highlights of RDM within the ELIXIR Communities:

  1. Rare Disease Community and RDM

  • Plants: Extending Bioschemas to plant biology, experience with real data

  • Community DBs, Aggregator DB for Niche communities

Time: 9.35-10.05

28 of 89

Rare Disease Community and Research Data Management

Nirupama Benis (NL)

Time: 9.35-9.45

29 of 89

Rare disease community and research data management

Nirupama Benis and Marco Roos

30 of 89

EJP RD

  • Create a comprehensive, sustainable ecosystem to circle between research, care and medical innovation​
  • Scattered data​
  • Benefits from going FAIR

#ELIXIR22

31 of 89

#ELIXIR22

32 of 89

European Reference Networks (ERNs)​

  • 24 ERNs​

  • Set up patient registries​

  • Share patient data responsibly​

  • Perform research to impact patient lives ​

#ELIXIR22

33 of 89

#ELIXIR22

34 of 89

Research data management​

#ELIXIR22

35 of 89

Rare disease registries

#ELIXIR22

36 of 89

Collect data​

  • Common data elements (CDEs)​

  • Proposed by the JRC​

  • For RD registries​

  • Promotes interoperability​

  • Covers key data points​

  • Not perfect but a great start​

#ELIXIR22

37 of 89

Analyse data​

  • EJP RD Virtual Platform allows federated queries over several RD resources​

  • Link registry data to information from other resources​

  • New features for data analyses ​

  • Workshop tomorrow​ (11.00 - 12.30 CEST)

https://vp.ejprarediseases.org/​

#ELIXIR22

38 of 89

Share data​

  • Interoperable data​
  • CDE semantic model (CDE-in-a-box)​
  • Can be connected to the EJP RD VP​
  • Share data with other ERNs or interested parties
  • Collect information on data sharing from registries​
  • Mappings to widely used data models​
  • OMOP​
  • CDISC​
  • FHIR​

#ELIXIR22

39 of 89

Reuse data​

  • Sensitive data​

  • Data use conditions​

  • Informed consent form​

  • Semantic model describing consent​

  • Existing technology​

#ELIXIR22

40 of 89

#ELIXIR22

41 of 89

Outlook R&D practical FAIRification support​

  • Maturation of FAIRification support infrastructure​
    • FAIRification tools for stewards (FAIRopoly, RDMkit, DSW, cookbook)​
    • FAIR-generating registry software (e.g. CastorEDC, LOVD, MOLGENIS, etc.)​
    • ‘FAIR-in-a-box’ to ‘ontologise’ data and metadata at source​
    • SaaS FAIR Data Point & model-to-API conversion services​
    • Automatically ‘find and use’ FAIR artefacts in FAIR registries (EOSC-Life)​
  • Tooling ontology-based data modelling still R&D​
  • Models for human sustainable FAIRification services emerging ​
  • Analytics examples on (federated) FAIR data AHM workshop!​

#ELIXIR22

42 of 89

Questions or Comments

#ELIXIR22

43 of 89

Plants: Extending Bioschemas to plant biology, experience with real data

Sebastian Beier (DE)

Time: 9.45-9.55

44 of 89

Bioschemas in a nutshell

  • Provides vocabulary of terms → Improve data discoverability and interoperability in the Life Sciences
  • Basic idea to annotate webpages with lightweight markup (in Plants: Genes, Phenotypes, Species, Genus, Accession, etc)

Connect with Plant standards

#ELIXIR22

45 of 89

MIAPPE for Bioschemas

  • : Minimal Information about a Plant Phenotyping Experiment
  • Set of 94 variables (+48 variables for environment)
  • Mapping to Bioschemas (start BioHackathon 2021)→ Meaningful subset

#ELIXIR22

46 of 89

Use Case #1

#ELIXIR22

47 of 89

Use Case #1

  • AgriSchemas: Modelling Agronomy with Bioschemas
  • Idea: Use lightweight models for building dataset - not only for web page annotations

#ELIXIR22

48 of 89

Use Case #1

Use case

Data Types

Data Sources

Status

Molecular Biology

Gene, Protein, Pathway

encodes, participates

Via Knetminer: ENSEMBL, UniProt, TILLING, wheat-expression.com, KEGG

Done

Ontology Annotations

Ontology Term (schema:DefinedTerm)

dc:type, schema:additionalType

Via Knetminer: GO, PO, CROP-Onto

Done

Experiments

Study, agri:StudyFactor, PropertyValue

EBI/GXA, GLTen, MIAPPE/BrAPI sources, ?

GXA Done

MIAPPE, much work done during ELIXIR BioHackathon, going on with monthly calls

GLTen use case drafted

Literature

agri:ScholarlyPublication

mentions

Via Knetminer: PubMed

Done

Gene Expression

bioschema:expressedIn, reified statements, agri:evidence, agri:pvalue, agri:baseCondition

EBI/GXA, Via Knetminer: wheat-expression.com

GXA

Host-pathogen interaction

Gene, Phenotype, agri:ScholarlyPublication

agri:HostPathogenInteraction

agri:evidence

PHI-Base

Use case drafted

Weather

?

?

TO DO

Dataset metadata

Dataset, DataCatalog

license, distribution

knetminer.org/data

ongoing

#ELIXIR22

49 of 89

Use Case #2

  • Schema.org and Dublin Core
  • Bioschemas ‘Dataset’ and ‘Taxon’ Profiles → Find specific dataset or look for species
  • Automatic addition to context pages
  • Accepted by Nature Publishing Group & Oxford GigaScience
  • Registered in re3data.org, OpenAIRE, FAIRsharing.org and DataCite

#ELIXIR22

50 of 89

Next steps

  • BioHackathon 2022 project link and finalise MIAPPE-Bioschemas mapping
  • Real data to be converted and integrated in KnetMiner/AgriSchemas endpoint
  • Merge with similar projects (e.g. PIPPA)

#ELIXIR22

51 of 89

Acknowledgements

Marco Brandizi

Daniel Arend

Erik Ralfs

Keywan Hassani-Pak

Cyril Pommier

Alasdair Gray

52 of 89

Appendix: e!DAL PGP Movie

https://edal-pgp.ipk-gatersleben.de/movie/SimpleShow.mp4

#ELIXIR22

53 of 89

Community DBs, Aggregator DB for Niche communities

Damiano Piovesan (IT)

Time: 9.55-10.05

54 of 89

IDRs – Ubiquitous and functionally important

  • 1/3 of the human proteome
  • ~100,000 transient, conditional, regulatory interaction modules
  • 70% of experimentally validated PTMs

#ELIXIR22

55 of 89

IDP research – experimental

  • IDP research spans several experimental fields studying protein structure and function
  • Most of the methods used are distinct to the field of IDPs

#ELIXIR22

56 of 89

IDP community goals

Necci et al. (2018) Database

Monzon et al. (2020) International Journal of Molecular Sciences

  1. Simplify IDP data access and dissemination
  2. Drive the curation of IDP literature
  3. Develop a centralised knowledgebase for IDP data
  4. Integrate IDP data into ELIXIR Core Data Resources
  5. Improve tools for IDP analysis

IDEAL

Nagoya University

MFIB / DIBS

ELTE University

IDPcentral

Former co-leads

Silvio

Tosatto

Norman

Davey

Wim

Vranken

Zsuzsanna

Dosztanyi

Damiano

Piovesan

Current co-leads

57 of 89

IDPcentral Vs IMEX

Mimic the IMEx consortium ???

  • Shared curation effort
  • Shared export format
  • Single dataset
  • Single detailed curation model

#ELIXIR22

58 of 89

IDPcentral and Bioschemas

  • name
  • description
  • ...

Protein

No need for custom APIs

FAIR community registry of Bioschemas metadata

  • name
  • description
  • ...

SequenceAnnotation

  • rangeStart
  • rangeEnd
  • ...

SequenceRange

Concept merging

59 of 89

Bioschemas Markup for IDP

59

Red Schema.org

Blue Pending Schema.org

Green Bioschemas

  • Protein
  • SequenceAnnotation
  • SequenceRange
  • Taxon
  • Dataset
  • Scholarly Article

Not shown

  • DataCatalog

60 of 89

BMUSE: Bioschemas Markup Scraper and Extractor

  • Data harvester
    • List of URLs and sitemaps
  • Extracts markup
    • JSON-LD or RDFa or both
    • Static or dynamic
  • Returns markup with provenance
    • Where
    • When
    • Tool version
  • File per page of harvested markup

61 of 89

Harvested Markup

  • Markup harvested through standard API
    • HTTP Get requests
    • Saves time as common API and model (Bioschemas) for all sites
  • Harvested markup is page centric
    • Multiple sites represent the same concept
    • Sites use their own IDs

No need for custom APIs

Concept merging

62 of 89

IDPcentral as a Knowledge Graph (IDP-KG)

62

63 of 89

Querying IDP-KG

63

Count by type

Protein information

Annotation count per protein

Annotation information

Annotation count by term code

64 of 89

IDPcentral as a registry

https://idpcentral.org/registry

The IDPcentral registry is just a (protein-centric) view of the IDP-KG

The IDP-KG can be expanded to include eternal SPARQL endpoints (ex. UniProt to get PDB data)

65 of 89

Conclusions

Summary

  • Markup added to three IDP sources
  • Markup harvested using BMUSE
  • Standard HTTP Get requests
  • Markup transformed into IDP-KG
  • Proteins merged from multiple sources
  • Analysis queries to explore IDP-KG
  • IDPcentral registry prototype

Next Steps

  • Add more IDP sources
  • Take IDPcentral to production
  • Feed data into OpenAIRE Research Graph

66 of 89

BioStudies database - filling in gaps in research data publishing

Ugis Sarkans (EMBL-EBI)

Time: 10.05-10.15

67 of 89

BioStudies – aggregating all research study data

  • Facilitate transparency and reproducibility of research, by aggregating all the outputs of a study, the data package, in a single place
  • A single study may be associated to a single publication, or may contain data from a unit of work in a larger project.

#ELIXIR22

68 of 89

BioStudies across the research life cycle

#ELIXIR22

69 of 89

BioStudies usage examples

#ELIXIR22

70 of 89

BioStudies usage examples

#ELIXIR22

71 of 89

BioStudies usage examples

#ELIXIR22

72 of 89

BioStudies usage examples

#ELIXIR22

73 of 89

BioStudies usage examples

#ELIXIR22

74 of 89

BioStudies for a new data type – why it works?

  • Generic underlying data model – files, links, flexible metadata, hierarchies if necessary
  • Robust, scalable implementation
  • Services typical for a range of life sciences data repositories
    • Data file upload and download – FTP, Aspera, Web
    • Metadata capture via web forms or via spreadsheets
    • Data lifecycle: pre-publication, access for reviewers, warning users about the upcoming release, post-publication modifications etc.
    • Integration into the larger scientific ecosystem - ORCIDs, RORs, DOIs etc. (in progress)
  • Data access
    • “Data collection” concept
    • Collection-specific search facets

#ELIXIR22

75 of 89

Ahmed Ali

Awais Athar

Ehsan Behrangi

Juan Rada

Jhoan Munoz

Team

Funding

Nestor Diaz

In collaboration with

#ELIXIR22

76 of 89

Community of Practice in training on data management

& data stewardship

Daniel Wibberg

ELIXIR-DE training coordinator, ELIXIR Training Platform and ELIXIR-CONVERGE WP2 member

Forschungszentrum Jülich GmbH, de.NBI, DE.

Time: 10.15-10.25

77 of 89

ELIXIR-CONVERGE

  • The purpose of ELIXIR-CONVERGE is to establish good life-science data management, reproducibility and reuse - infrastructure foundation for Open Science

  • ELIXIR-CONVERGE do this by
    • Consolidate good practices, provide a common toolkit
    • to develop the national operations and brokering capabilities of the distributed ELIXIR research infrastructure
    • Disseminate outcomes to research communities throughout the ERA

  • Thus, ELIXIR-CONVERGE takes the next step to realise a European data federation driven by connected national operations - strategically managed via national research infrastructure roadmaps

#ELIXIR22

78 of 89

WP2 at a glance: Data Management and Stewardship Training by the Nodes for the Nodes

C0-production model: connected to all other WPs in CONVERGE

22

ELIXIR Nodes

114

PM

WP1

WP3

WP4

WP5

WP6

WP7

WP8

WP9

Aimed at

  • Data stewards
  • Researchers
  • Tool developers
  • Trainers
  • Design, build and deliver DM/DS courses
  • Build expertise in DM/DS in the Nodes
  • Build DM/DS training expertise in the Nodes
  • In alignment with national DM/DS strategies

#ELIXIR22

79 of 89

The idea: Community of Practice in Data Management and Stewardship

Node DS/DM

Com.

….

DS/DM trainer community of practice

ELIXIR-CONVERGE

Hackathons

Community meeting

….

Communities of special interest

  • Developing training material
  • Developing learning paths
  • Developing Tools for DM/DS training
  • Experience exchange
  • Best practice
  • Networking
  • Train the Trainer
  • DMP Training
  • FAIR Training
  • Tools Training

#ELIXIR22

80 of 89

Online Kickoff Meeting – 24 March

12:00-12:05 | Welcome (Daniel Wibberg, ELIXIR-DE, de.NBI)

12:05-12:15 | Introduction to ELIXIR-CONVERGE with focus on WP2 (Celia van Gelder, ELIXIR-NL, DTL)

12:15-12:45 | Examples of national DM/DS communities involved in training

NFDI - Germany (Daniel Tschink, NFDI4Biodiversity, GFBio)

DaSH Project - UK (Robert Andrews, DaSh Project, Cardiff University)

12:45 - 13:05 | Short introductions to the first community events - Training material hackathons and their topics.

13:10 - 13:15 | Introduction to CoP (Daniel Wibberg, ELIXIR-DE, de.NBI)

13:15 - 13:35 | Breakout Room Discussions of important community questions

13:35 - 14:00 | Result Presentation, Summary and Goodbye

#ELIXIR22

81 of 89

Online Kickoff Meeting – 24 March – Who was there?

#ELIXIR22

82 of 89

Online Kickoff Meeting – 24 March – What was discussed?

#ELIXIR22

83 of 89

First CoP events – Hackathon series for training material

  • DMP (Nils-Christian Lübke, Helena Schnitzer, Daniel Wibberg) – ELIXIR-DE
    • 29 March, 1 April, 8 April, 10 May, 3 June
  • DM/DS Basics (Ana Melo, Brane Leskosek, Erik Hjerde, Wolmar Akerström, Stephan Nylinder) – ELIXIR-PT, SI, NO & SE
    • 8 June, 22 June
  • FAIR (Mijke Jetten, Martijn Kersloot, Stephan Nylinder) - ELIXIR-NL & SE
    • 29 June
  • Reproducibility (Alexia Cardona, Nazeefa Fatima) – ELIXIR-UK & NO
    • ?

#ELIXIR22

84 of 89

Example: DMP Hackathon series for training material

  • In total, we had 25 registration and 20 participants
    • Five hackathons day: 29 March, 1 April, 8 April, 10 May, 3 June

  • 3 slide decks
    • Introduction
    • How write a DMP?
    • Recommendations

#ELIXIR22

85 of 89

CoP next steps: 2nd Event – 21 June

13:00-13:05 | Welcome (Daniel Wibberg, ELIXIR-DE, de.NBI)

13:05-13:15 | Results of the discussion round at the kickoff event and next steps (Daniel Wibberg, ELIXIR-DE, de.NBI)

13:15-13:25 | Current status of the Hackathon series (Helena Schnitzer, ELIXIR-DE, de.NBI)

13:25-13:40 | FAIRDOM & Training (Ulrike Wittig, ELIXIR-DE, FAIRDOM)

13:40 - 13:55 | DataPlant & Training (Sebastian Beier, ELIXIR-DE, DataPlant)

13:55 - 14:00 | Summary and Goodbye

#ELIXIR22

86 of 89

CoP future plan

  • Discussion on the goals on the CoP on Friday at WP2 Breakout session
  • More Hackathons for training material
  • Another Online Event in November
    • Presentations about famous DM/DS tools and how they are involved in training
    • Discussion based on questions by the tool developer
  • And more…

#ELIXIR22

87 of 89

Acknowledgements

Helena Schnitzer

Nils Lübke

Celia van Gelder

Ana Melo

Brane Leskosek

Erik Hjerde

Wolmar Akerström

Stephan Nylinder

Mijke Jetten

Alexia Cardona

Nazeefa Fatimajn

88 of 89

Closing Remarks

Time: 10.25-10.30

Carole Goble (UK) - Chair

89 of 89

Q+A Discussion

Time: if time allows at end

Slido link: Here!

Or

Go to: ‘ sli.do ‘ on your web browser

Enter Slido code: 2900782

Slido QR code: