1 of 68

Open Science in the project lifecycle - access & reuse

Elin Kronander

Open Science in the Swedish context, 2025-05, Stockholm

2 of 68

Repositories are key for share and reuse

Re-colored data Life Cycle by RDMkit used under CC-BY

3 of 68

Reuse - open is not enough

4 of 68

Reuse - open is not enough

What do we need in order to be able to reuse a research output?

5 of 68

Reuse - open is not enough

What do we need in order to be able to reuse a research output?

  • That the object exists/where to look for it
  • Detailed information about the object
  • How the object can be accessed
  • How the object could/should be used
  • Who and how has the right to use it

6 of 68

Reuse - open is not enough

What do we need in order to be able to reuse a research output?

  • That the object exists/where to look for it - Repositories
  • Detailed information about the object - Metadata
  • How the object can be accessed - Persistent Identifiers
  • How the object could/should be used - Metadata
  • Who and how has the right to use it - Licenses

7 of 68

Reuse - open is not enough

What do we need in order to be able to reuse a research output?

8 of 68

Screenshot from the SciLifeLab Data Repository

9 of 68

Types of repositories

  • Domain/type-specific:
  • General purpose:
    • Second best - long-term plan, might cost (now or in future), good reach but less specific in metadata → more difficult for future users to judge if a dataset will be useful
    • E.g. Zenodo, (SciLifeLab) Figshare, SND Doris, Dryad
  • In-house/institutional
    • For archive/backup purpose mainly, might cost, limited reach unless also published in a data catalogue

Domain-specific

General �purpose

In-house

10 of 68

Data catalogues and registries

Output specific registries

WorkflowHub - a registry for describing, sharing and publishing scientific computational workflows

Protocols.io a platform for organising, describing and sharing methods

ELIXIR TeSS - a registry for training materials

Data repositories

FAIRsharing.org/databases - catalogue of many repositories, with possibility to filter on e.g. domain

Scientific Data Repository Guidance - publisher’s recommendation

re3data.org - registry of research data repositories

Swedish data

researchdata.se - a national web portal where you can find, share, and reuse research data from a wide range of disciplines

dataportalen.se - a national web portal where you can find, share, and reuse data beyond research data

11 of 68

Restricted access repository: FEGA Sweden

  • National repository for storing and sharing personally identifiable information from Swedish biomedical research projects.

12 of 68

FAIR principles

FAIR principles illustration. SND/Svensk nationell datatjänst. CC-BY 4.0

13 of 68

The FAIR data guiding principles

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

R1.1. (meta)data are released with a clear and accessible data usage license

R1.2. (meta)data are associated with detailed provenance

R1.3. (meta)data meet domain-relevant community standards

14 of 68

FAIR does not mean open

FAIR and OPEN image from CESSDA Data Archiving Guide version 4.0. CESSDA Training Team (2025). CC-BY 4.0

15 of 68

Metadata

Open Science in the project lifecycle - access & reuse

16 of 68

Metadata

The data about the data (or anything really)

“One person’s metadata, is another person’s data”

17 of 68

Impactful metadata

  • Human readable
  • Standardised
  • Machine readable

18 of 68

Exercise instructions

In groups:

  1. Examine the packaging of the food item
  2. List the information you find
    1. Think broadly
  3. Consider why each piece of information is included in the packaging

19 of 68

Exercise reflection

  • Share 1-2 interesting pieces of information you found on the packaging
  • How would these elements relate to research metadata?
  • Going back to the previous slide, could you identify something that could be described as standardised metadata or machine readable metadata?

20 of 68

Elements relating to research metadata

  • Product Name → Research Title
  • Ingredients → Methods/Materials
  • Best Before Date → Creation Date/Version
  • Manufacturer → Author/Affiliation

21 of 68

Human readable

Machine readable

Standardised

22 of 68

Metadata at different levels

?

Source: Openclipart

?

.fastq

Source: Openclipart

Source: Openclipart

This slide, including all images, are in the public domain and free to reuse.

23 of 68

Metadata in the Data Life Cycle

Determine what metadata is needed for doing the project and sharing data

Collecting metadata

Describing

datasets

Using metadata to perform analysis

Describing datasets

Understanding datasets

Organising metadata in appropriate format(s)

24 of 68

The beauty of formalised metadata

  • Describe the same thing! (i.e. Contextual terms)

  • Use the same words! (i.e. Recommended terminologies)

  • With good levels of detail! (i.e. Metadata templates)

Ok, but what is that?

Source: Openclipart

Repositories are your friends!

This slide, including all images, are in the public domain and free to reuse.

25 of 68

Controlled Vocabularies & Ontologies

Controlled Vocabularies

  • A predefined, organised list of authorised terms
  • Ensures consistency in data description
  • Examples: Species taxonomies, Medical Subject Headings (MESH)

Ontologies

  • A structured framework showing relationships between concepts/terms
  • Supports complex data integration and interoperability

Benefits as Metadata Descriptors

  • Improved Data Consistency: Reduces ambiguity and enhances searchability
  • Enhanced Interoperability: Facilitates data sharing across systems
  • Efficient Data Management: Streamlines classification and retrieval processes

This slide, including all images, are in the public domain and free to reuse.

26 of 68

Metadata transformation

A simple example:

Place

Strängnäs

One location

=

One metadata value

Does anyone need to know more?

This slide, including all images, are in the public domain and free to reuse.

27 of 68

Metadata transformation

Geographic location (country and/or sea)

Geographic location (region and locality)

Geographic location (latitude)

Geographic location (longitude)

Sweden

Strängnäs

59.29

17.12

Yes!

More detail let others (and your future self) know what you have done

Rich metadata is the key to scientific reliability!

This slide, including all images, are in the public domain and free to reuse.

28 of 68

Persistent Identifiers

Open Science in the project lifecycle - access & reuse

This presentation originates from session 8 of the The ELIXIR FAIR training material by Design course. DOI:10.5281/zenodo.13773159

29 of 68

What is a unique persistent identifier (PID)

Identify

Verify

Locate

30 of 68

JANE DOE, HIGH ST 5

NORWAY

JANE DOE, MILL RD 1 �SWEDEN

JANE SMITH, CASHEW WAY

GERMANY

JANE

31 of 68

Features of PIDs

  • Globally unique
    • It should comply with a controlled syntax to avoid clashes

  • Persistent
    • It should be maintained for a long period of time. The syntax used for the identifier should also be persistent

  • Resolvable
    • It should allow both human and machine users to access the resource

32 of 68

Are you familiar with some persistent identifiers?

33 of 68

Digital Object Identifier - DOI

Illustration from the DOI Foundations website�https://www.doi.org/the-identifier/what-is-a-doi/

34 of 68

Digital Object Identifier - DOI

Illustration from The DOI Handbook April 2023 (https://doi.org/10.1000/182 identifies the latest current version of the handbook)

35 of 68

Other examples of PIDs

ORCID - Open Researcher and Contributor ID

  • persistent identifiers for researchers
  • takes homonymy into account
  • add aliases to your profile if your name changes
  • ORCID stays the same when affiliation changes

The Research Organization Registry

  • persistent identifiers for research organizations

36 of 68

Accession numbers

The European Nucleotide Archive (ENA) assigns accession numbers to digital entities at various levels of granularity.

37 of 68

Benefits of PIDs

  • uniquely distinguish resources from similar objects (F)
  • a place to keep the metadata (F)
  • machine actionable identifiers increase findability (F)
  • resolves providing a way or information on how to access the object (A)
  • enhances citability leading to easier reuse (R)

38 of 68

Licenses

Open Science in the project lifecycle - access & reuse

39 of 68

Why license your research output?

Image from Flickr, licensed CC0 (https://flic.kr/p/6kHZ9r)

Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024

40 of 68

Why license your research output?

Image from Flickr, licensed CC0 (https://flic.kr/p/6kHZ9r)

FREE

to a good home

Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024

41 of 68

Why license your research output?

A license is a clear invitation for others to REUSE your research data or software – on your terms.

Image from Flickr, licensed CC0 (https://flic.kr/p/6kHZ9r)

FREE

to a good home

Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024

42 of 68

Why license your research output?

Sources : 1 - 2 - 3 - 4

FAIR principles for research data 🡪 R1.1. (meta)data are released with a clear and accessible data usage license.

FAIR principles for research software 🡪 R1.1. Software is given a clear and accessible license.

Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024

43 of 68

Why license your research output?

Make the property rights clear & access and usability rules evident

  • Protection of one’s right
  • More citations
  • More reuse
  • Greater impact

44 of 68

What type of licences have you come across?

45 of 68

Overview of available open licenses

There are numerous licenses, some of which can be customized to meet specific needs, while others are highly specific.

Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024

46 of 68

Creative Commons

47 of 68

Creative Commons - before licensing

  • The licenses and CC0 mark cannot be revoked
    • Updates only apply for “new users”�
  • Only the copyrights holder can apply a CC license
    • If created in the job , you might not be the holder

48 of 68

Creative Commons

Core conditions:

Public domain tools:

Attribution

ShareAlike

NonCommercial

NoDerivatives

No rights reserved

PDM

No known copyright

49 of 68

Creative commons licences

Creative commons license spectrum by Shaddim; original CC license symbols by Creative Commons. CC-BY-SA 4.0

50 of 68

Mission

  • coordinate and support the digitalisation of public administration.
  • responsible for Sweden's digital infrastructure.
  • to follow up and analyse the digitalisation of society.
  • to help the government make well-informed decisions.

51 of 68

Creative commons licences -recommendations

information/data not subject to copyright or IP protection

information/data subject to copyright

authorities should not impose this type of limiting condition or control compliance

The Agency for Digital Government (DIGG):

Creative commons license spectrum by Shaddim; original CC license symbols by Creative Commons. CC-BY-SA 4.0

52 of 68

Tools for choosing a license

  • Choose a license�https://choosealicense.com/

53 of 68

Licences vs Terms of use

Disadvantages:

  • Not standardised
  • Not machine readable

54 of 68

Licences vs Terms of use

Monitoring progress towards standardised licensing

  • Standardise the licences used across EMBL-EBI resources.
  • Use licences that present the lowest barriers to data reuse. CC0 is preferred over CC-BY.
  • State the licence explicitly both on the resource and at the record level.

55 of 68

Restricted access: EGA example

56 of 68

Restricted access: EGA example

Lawson et al., 2021, Cell Genomics 1, https://doi.org/10.1016/j.xgen.2021.100028

57 of 68

58 of 68

59 of 68

Lets try this out!

60 of 68

61 of 68

Reflections

  • Were some metadata fields difficult to provide an answer to?
  • Did something surprise you with the metadata requirements in Zenodo?

62 of 68

Repositories are key for share and reuse

Why using repositories?

When to start thinking about repositories?

63 of 68

Repositories are key for share and reuse

Why using repositories?

  • Helps you adhere to the FAIR principles

When to start thinking about repositories?

  • Early on
    • File formats
    • Metadata fields
    • Lightweight backup

64 of 68

Evaluate a repository

Things to check when evaluating:�

  • Are others in the community using it?
  • Is it easy to navigate / user-friendly?
  • Is there support / guidance for submission and reuse?
  • Is it sustainable, i.e. will the repository be around for a while?
  • Will the datasets obtain persistent identifiers? Is the repository itself FAIR?
  • What are the usage rights or license options?

65 of 68

Identify repositories

How to find a suitable repository for life science data?

66 of 68

Demo: EBI Repository Wizard

Which repository would be suitable if you have a genomics project with mice RNA sequences?�

  • Go to https://www.ebi.ac.uk/submission/
  • Answer the questions regarding
    • data type
    • need for controlled access
    • if experimentally produced by you
    • type of study

67 of 68

Key Points

  • Rich metadata is the key to scientific reliability and reuse.
  • No license - no reuse (without asking for permission).
  • Persistent identifiers allow an object to be identified, verified and located.
  • Using repositories increases FAIRness and reuse:
    • Identifying a repository early one helps to formalise metadata from the start.
    • If possible, use a domain-specific repository since it has maximum reach in the research community.
    • The research output (data) types determines which domain-specific repository is suitable.

68 of 68

Intro DM course