Open Science in the project lifecycle - access & reuse
Elin Kronander
Open Science in the Swedish context, 2025-05, Stockholm
Repositories are key for share and reuse
Reuse - open is not enough
Reuse - open is not enough
What do we need in order to be able to reuse a research output?
Reuse - open is not enough
What do we need in order to be able to reuse a research output?
Reuse - open is not enough
What do we need in order to be able to reuse a research output?
Reuse - open is not enough
What do we need in order to be able to reuse a research output?
Screenshot from the SciLifeLab Data Repository
Types of repositories
Domain-specific
General �purpose
In-house
Data catalogues and registries
Output specific registries
WorkflowHub - a registry for describing, sharing and publishing scientific computational workflows
Protocols.io a platform for organising, describing and sharing methods
ELIXIR TeSS - a registry for training materials
Data repositories
FAIRsharing.org/databases - catalogue of many repositories, with possibility to filter on e.g. domain
Scientific Data Repository Guidance - publisher’s recommendation
re3data.org - registry of research data repositories
Swedish data
researchdata.se - a national web portal where you can find, share, and reuse research data from a wide range of disciplines
dataportalen.se - a national web portal where you can find, share, and reuse data beyond research data
Restricted access repository: FEGA Sweden
FAIR principles
FAIR principles illustration. SND/Svensk nationell datatjänst. CC-BY 4.0
The FAIR data guiding principles
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
FAIR does not mean open
FAIR and OPEN image from CESSDA Data Archiving Guide version 4.0. CESSDA Training Team (2025). CC-BY 4.0
Metadata
Open Science in the project lifecycle - access & reuse
Metadata
The data about the data (or anything really)
“One person’s metadata, is another person’s data”
Impactful metadata
Exercise instructions
In groups:
Exercise reflection
Elements relating to research metadata
Human readable
Machine readable
Standardised
Metadata at different levels
?
Source: Openclipart
?
.fastq
Source: Openclipart
Source: Openclipart
?
Source: Publicdomainpictures
This slide, including all images, are in the public domain and free to reuse.
Metadata in the Data Life Cycle
Determine what metadata is needed for doing the project and sharing data
Collecting metadata
Describing
datasets
Using metadata to perform analysis
Describing datasets
Understanding datasets
Organising metadata in appropriate format(s)
The beauty of formalised metadata
Ok, but what is that?
Source: Openclipart
Repositories are your friends!
This slide, including all images, are in the public domain and free to reuse.
Controlled Vocabularies & Ontologies
Controlled Vocabularies
Ontologies
Benefits as Metadata Descriptors
This slide, including all images, are in the public domain and free to reuse.
Metadata transformation
A simple example:
| Place | |
| Strängnäs | |
| | |
One location
=
One metadata value
Does anyone need to know more?
This slide, including all images, are in the public domain and free to reuse.
Metadata transformation
| Geographic location (country and/or sea) | Geographic location (region and locality) | Geographic location (latitude) | Geographic location (longitude) | |
| Sweden | Strängnäs | 59.29 | 17.12 | |
| | | | | |
Yes!
More detail let others (and your future self) know what you have done
Rich metadata is the key to scientific reliability!
This slide, including all images, are in the public domain and free to reuse.
Persistent Identifiers
Open Science in the project lifecycle - access & reuse
This presentation originates from session 8 of the The ELIXIR FAIR training material by Design course. DOI:10.5281/zenodo.13773159
What is a unique persistent identifier (PID)
Identify
Verify
Locate
JANE DOE, HIGH ST 5
NORWAY
JANE DOE, MILL RD 1 �SWEDEN
JANE SMITH, CASHEW WAY
GERMANY
JANE
Features of PIDs
Are you familiar with some persistent identifiers?
Digital Object Identifier - DOI
Illustration from the DOI Foundations website�https://www.doi.org/the-identifier/what-is-a-doi/
Digital Object Identifier - DOI
Illustration from The DOI Handbook April 2023 (https://doi.org/10.1000/182 identifies the latest current version of the handbook)
Other examples of PIDs
ORCID - Open Researcher and Contributor ID
The Research Organization Registry
Accession numbers
The European Nucleotide Archive (ENA) assigns accession numbers to digital entities at various levels of granularity. �
Benefits of PIDs
Licenses
Open Science in the project lifecycle - access & reuse
Why license your research output?
Image from Flickr, licensed CC0 (https://flic.kr/p/6kHZ9r)
Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024
Why license your research output?
Image from Flickr, licensed CC0 (https://flic.kr/p/6kHZ9r)
FREE
to a good home
Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024
Why license your research output?
A license is a clear invitation for others to REUSE your research data or software – on your terms.
Image from Flickr, licensed CC0 (https://flic.kr/p/6kHZ9r)
FREE
to a good home
Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024
Why license your research output?
FAIR principles for research data 🡪 R1.1. (meta)data are released with a clear and accessible data usage license.
FAIR principles for research software 🡪 R1.1. Software is given a clear and accessible license.
Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024
Why license your research output?
Make the property rights clear & access and usability rules evident
What type of licences have you come across?
Overview of available open licenses
There are numerous licenses, some of which can be customized to meet specific needs, while others are highly specific.
Open Licenses for Data, Code, and Software Pilot – CC-BY SA – 19.06.2024
Creative Commons
Creative Commons - before licensing
Creative Commons
Core conditions:
Public domain tools:
Attribution
ShareAlike
NonCommercial
NoDerivatives
No rights reserved
PDM
No known copyright
Creative commons licences
Creative commons license spectrum by Shaddim; original CC license symbols by Creative Commons. CC-BY-SA 4.0
Mission
Creative commons licences -recommendations
information/data not subject to copyright or IP protection
information/data subject to copyright
authorities should not impose this type of limiting condition or control compliance
The Agency for Digital Government (DIGG):
Creative commons license spectrum by Shaddim; original CC license symbols by Creative Commons. CC-BY-SA 4.0
Tools for choosing a license
Licences vs Terms of use
Disadvantages:
Licences vs Terms of use
Plot from EBI https://www.ebi.ac.uk/licencing/
Monitoring progress towards standardised licensing
Restricted access: EGA example
Restricted access: EGA example
Lawson et al., 2021, Cell Genomics 1, https://doi.org/10.1016/j.xgen.2021.100028
Lets try this out!
Reflections
Repositories are key for share and reuse
Why using repositories?
When to start thinking about repositories?
Repositories are key for share and reuse
Why using repositories?
When to start thinking about repositories?
Evaluate a repository
Things to check when evaluating:�
Identify repositories
How to find a suitable repository for life science data?
Demo: EBI Repository Wizard
Which repository would be suitable if you have a genomics project with mice RNA sequences?�
Key Points
Intro DM course