Data management best practices
March 2024
Dayane Araújo
www.ebi.ac.uk/training
Welcome to this virtual course
Course and events organisers
After this session you should be able to
Helpful information
Course handbook
Bookmark it!
How to ask question
The importance of “good” data management
7
3 reasons to share your data…
8
Selfish
Scientific
Good Citizen
Key thing!
Start with the plan……
9
“Your primary collaborator is yourself six months from now, and your past self doesn’t answer e-mails.”
Rachael Ainsworth,
astrophysicist at the University of Manchester, UK
Data management plan (DMP)
EMBL requires that all projects have a data management plan………………..
………….funded by grants, or is part of the PhD and postdoctoral projects, or is intended to support a scientific article
10
Creating a Data Management Plan
11
How to get started?
Data management checklist
Best practice tips
13
Organising data
14
File names and folder structures
15
Analysing data
16
Planning analyses
When planning an analysis, we need to consider and keep track of:
Workflows go a step beyond keeping track of these elements by allowing us to build in decisions we make, such as parameters we select, ensuring reproducibility of our analyses.
17
Workflows
Computational workflows allow for automation of multi-step analyses and support the reproducibility of analyses.
Workflows can include tools written in different programming languages. A workflow consists of a set of rules, which each have an input and output.
18
Tools for developing workflows
There are many different platforms and tools for creating workflows, such as:
Learn more about workflows with this short introduction.
19
Storing data
20
Storing your data
Record keeping
22
Good record keeping
23
Have you tried Electronic Lab Notebooks
OR Computational Notebooks:
Sharing data
24
Good data management = good data�
25
Choosing a data repository
26
Adapted from ‘Managing and making the most of your data’ – Marta Teperek and Yasemin Türkyilmaz- van der Velden
Ontologies – adding more information
Ontologies make it easier to…
27
cerebellum
cardiac atrium
cardiac ventricle
ventricle of heart
brain
heart
hypothalamus
organism part
atrium of heart
=
=
Finding ontologies
28
What if I am only using public data?
29
You can still apply these practices!
30
How to cite data
Minimum information:
Check with the database for any specific way to cite them.
31
Author(s). Year. Title. Repository. (Version). Identifier
Any questions?
32