1 of 43

This presentation is a part of Aalto University’s webinar series onResearch Data Management�& Open Science

Spring 2022

2 of 43

Introduction to research data management (RDM)

Dr Essi Viitanen �Research Services

3 of 43

Content

  • What is RDM and what is research data
  • Why RDM is important
  • What are the requirements from funders and publishers on RDM
  • RDM lifecycle
  • What services Aalto provides for research data management

4 of 43

Types of research data

Code, medical data, drawings, statistical models, production processes…

5 of 43

Research Data

is any information that has been collected, observed, generated or created to validate research findings.

6 of 43

Research Data Management

(RDM)

refers to the organization, storage, sharing, and preservation of research data.

Data management practices depend on:

  • How the data will be generated and what analyzing methods will be used. 
  • What instruments and software will be employed.

7 of 43

Why is RDM important?

    • It leads to more efficient research practices that will save time for research.
    • Helps you and your colleagues find, understand and communicate about the data.
    • Improves transparency and possibilities to verify and reproduce the research results.
    • Enables the re-use, sharing and opening of data.
    • Necessary knowledge and skills required in research, scientific and technical organisations and in the companies.

8 of 43

Open Data requirements

  • Most funders foster open data and good data management practices
  • It is recommended to open the data always when possible
  • If you can't open your data, it is recommended that you open at least the metadata
  • Some journals also demand open data
  • Academy of Finland and EU Horizon requirements
    • Data management plan 🡪 shows that you can manage your data
    • Be prepared to publish your data if there are no justified reasons to keep it closed, e.g. data obtained from a company and no right to open it
  • You will meet these demands by following the Aalto Open Science and Research Policy and Aalto RDM guidelines

9 of 43

The lifecycle of data

Report & Archive

Publish

Store & Share

Organize & document

Consider ethics & legal issues

Plan

10 of 43

Plan

Plan

11 of 43

DMP 

(data management plan) describes what kind of data you will generate and how you will handle and manage it.

  • Write the plan alongside with your research plan before you generate/collect the data. 
  • Pay special attention to the legal and ethical issues if you will
    • Use/generate personal data
    • Obtain data from external sources, e.g. companies
    • Aim at inventions/patents

Why write a DMP?

  • The plan helps you to be systematic in managing your data and it will save your time.
  • The funders require plans. Check their guidelines in advance.

Plan

12 of 43

Questions a DMP should answer

  1. What your data is about?

What is the general subject and range or scope?

How will you generate/obtain and process/analyze your data?

  • How will you describe your data?

What methods you will use? Read-me-file, code-book…?

How will you describe and organise your data?

  • What are the ethical and legal aspects you should consider and how will you take them into account ?
  • How will you store and share your data?
  • Are you going to open your data?

If you open the data, how will you do that?

When you will open the data?

If you don’t open your data what are the reasons?

Update the plan during the project

Plan

13 of 43

Resources available

Data management planning tools: �DMPTuuli Finnish funders’ and general templates

DS Wizard bank of RDM questions

DMP Online by the Digital Curation Centre

Argos by OpenAIRE�

Aalto data management planning (DMP) guidelinesfollow the guidelines and benefit from the DMP templates on the page

DMP review service at Aalto: �Send your DMP to researchdata@aalto.fi

Plan

14 of 43

Consider ethics & legal issues

Consider ethics & legal issues

15 of 43

Special data types

Personal data & Sensitive data

  • Personal data includes information that identifies a person directly or indirectly: name, ID number, location data, IP address, one or more factors specific to the physical, psychological, genetic, cultural or social identity of a natural person
  • Personal sensitive data: racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, genetic data, biometric data, data concerning health or data concerning a person's sex life or sexual orientation

Confidential data

  • E.g. the data is obtained from company and it covers commercial interests and trade secrets.
  • Take account of the data collection and handling practices based on legislation, agreements and ethical guidelines, e.g., to protect the research participants rights and respect the ownership of data.

Consider ethics & legal issues

16 of 43

Working with personal/�sensitive/�confidential data

Before collecting...

  • Plan the life cycle of personal data handling.
  • If you will collect sensitive personal data get an ethical pre-review.
  • Clarify the procedures you should use to inform research subjects about their rights.

Collecting, storing and sharing...

  • Use secure online data collection methods. 
  • Use storing and sharing services that suit personal data and enable secure sharing. 
  • Define who has the right to access the data.

Publishing...

  • Check that you can publish your data.
  • Prepare your data for publishing, e.g. use anonymizing and pseudonymization methods. 
  • Define access rights in the repository so that they suit to your data.
  • If you cannot publish the  data, you may publish the description of your data.

Consider ethics & legal issues

17 of 43

Guidance on Ethics & Legal issues

Aalto guidelines to meet the ethical and legal requirements.   

Benefit from the guidelines provided by Finnish Social Science Data Archive �(e.g. Anonymization and minimizing the collection of personal information).

The guides and help ensure compliance with legal and ethical rules

Consider ethics & legal issues

18 of 43

Organize & Document

Organize & Document

19 of 43

Why Organize & Document?

  • Will you understand your data in ~5 years?
  • Will the group leader understand your data (not even in 5 years, but now)?
  • Will someone else be able to continue the project in the future based on your data and documentation/metadata?

Ensure that you and others can find, use, and properly cite your data by naming and organizing your data wisely and describing them well

Good documentation decreases the risk of false interpretation of the data

Documenting practices are specific to the field and data 

Organize & Document

20 of 43

Organize

Develop a logical directory/folder structure

  • Name folders appropriately.
  • Separate ongoing and completed work.
  • Schedule the time when you clean the unnecessary files from folders.

Use consistent file naming conventions

  • Use meaningful names
  • Decide which elements should be first to facilitate file finding
  • Agree on elements: date, experiment name, method, project identifier, version, subset, source identifier etc.

Make a supplementary document describing the naming and structures of folders and files.

Folder structure

File name

2020-04-30_SimulationsCuF_REP_Mikhail_v.1

Supplementary �documents

Example source: Mikhail Kuklin/ Aalto

Organize & Document

21 of 43

Places for document info

Information included within data files

  • Variable names in a spreadsheet.
  • Descriptive headers or summaries.
  • Descriptive information at the beginning of interviews.

Supplementary documentation

Metadata standards

  • The basic bibliographic information about data on data repositories/ archives to upload the data: (title, creators,…)
  • Disciplinary specific standards to describe data, e.g. DDI, cultural heritage, crystallographic data. 
  • General standards to describe research activity, exchanging, identifying and for general description of research outputs.

Organize & Document

22 of 43

Project folder

Organize & Document

23 of 43

Readme files

Readme files

Organize & Document

24 of 43

Readme files details

  • TITLE
  • CREATOR and affiliation
  • DESCRIPTION of the data package and folder overview.
  • LOCATION or coverage
  • METHODOLOGY: How the data was generated, including equipment or software used, experimental protocol/method, other things that are important to understand your data (might be included in a lab notebook)
  • IDENTIFIER even if it is just an internal project reference number.
  • LOCATION
  • DATES
  • SUBJECT or Keyword
  • FORMATS of dataset files

Readme files

Organize & Document

25 of 43

Supplementary files

Supplementary files

Organize & Document

26 of 43

Supplementary files details

Supplementary files

1.3 Research Data Structure and Collection

Country: Finland

Target area: Finland

Observation / unit type: Person, Organization

Population / sample: people working in urban planning

Date of data collection: 4.10.2013 - 19.12.2013

Collectors: Schulman, Harry (University of Helsinki. Department of Geosciences and Geography); Faehnle, Maija (University of Helsinki) �Data collection technique: Target group discussion: face-to-face discussion, Target group discussion: phone conversation

Collection tool or instruction: Interview themes or interview body

Temporal coverage of the material: 2013

Time dimension of the research: Cross-sectional data

Observation / selection of data units: Non-probability sampling: discretionary sampling

Most of the interviewees were selected from the Helsinki metropolitan area, as they wanted comparable data. With similar material collected in the Stockholm region. In addition to the nationwide Due to the development of the green structure design guide, the interviewees were also selected from Helsinki from outside the metropolis. The intention was to get representatives from both the public, private and from the third sector. Two interviews were conducted by telephone, the others face to face with 1-3 individuals groups. The interviewees were sent an interview frame in advance to see.

Amount of material: 25 interviews in txt and html files. The duration of the interviews varied slightly less than an hour and a half.

APPENDIX A: Research Questionnaire

Organize & Document

27 of 43

Data files

Data files

Organize & Document

28 of 43

Data files details

Data files

Details of data

Organize & Document

29 of 43

Store & Share

Store & Share

30 of 43

Storing and Sharing

Things to consider:

  • Is your data being backed up?
  • Who has access to the data?
  • Is your data stored safely and securely?
  • Is it a safe and secure way to share the data?

Check the detailed instructions about the security level of services: �Quick guide to information classifications and services

Store & Share

31 of 43

Aalto Network Drive

  • Large, fast, secure, and backed up
  • The best place to save working data in Aalto
  • Like folders in your computer but more secure
  • Aalto account needed
  • Remote access with VPN
  • To get a folder for your group or more space, �contact servicedesk@aalto.fi
  • Suitable for personal and confidential data.

Personal storage space (home.org.aalto.fi) 

  • No sharing
  • Available to everyone on Aalto workstation.

Departmental storage space (work.org.aalto.fi)

  • Allows sharing inside department, e.g. share with the supervisor.

Research groups’ storage space (teamwork.org.aalto.fi)

  • Allows sharing within Aalto

Help: ITS servicedesk & service descriptions for more details.

Storage services for research data 

Store & Share

32 of 43

Aalto Cloud Services

Microsoft Teams

Collaboration tool: web meetings, file sharing, etc.

Cloud storage Microsoft OneDrive, Google Drive and Dropbox

Easy sharing of non-confidential data

Use Aalto account to get more space and features.

eDuuni

An e-work and collaboration service environment.

Suitable for personal and confidential data.

Examples:

  • Interview data containing personal sensitive information or confidential data obtained from a company cannot be stored and shared via cloud services. Solution: Aalto network drives
  • Experimental data from the lab can be stored and shared via cloud services

Store & Share

33 of 43

Publish

Publish

34 of 43

Publishing your data

  • Open data is stored and published in a data repository/archive.
  • Data in the repositories is available to anyone to find and reuse.
  • The owner of the data decides how open the data will be: open to all; embargoed; open by request
  • The owner of the data decides what license to use to open the data, e.g. CC license.

WHY PUBLISH Your DATA?�

  • Published data can be re-used easily by others.
  • Only published data is an independent research output like the scientific article is.
  • Published data can be cited giving credits to the creators and other contributors.
  • Citation advantage: Your article gets more citations if you link open data to it
  • Typically, the funders require open data. Also some journals demand open data.

Publish

35 of 43

Repositories

General repositories cover different types of data and research outputs, e.g.:

Domain specific repositories are planned for the specific data type, e.g.

�Catalogue of repositories: http://re3data.org/

FSD

Publish

36 of 43

Examples of published Aalto datasets

Publish

37 of 43

Report your data in ACRIS

  • Send the link to your published data e.g. in Zenodo to researchdata@aalto.fi and the information about your data will be added to ACRIS, research information system of Aalto
  • Special case: only the description (metadata) of personal/confidential data will be published �-> a requirement in the Academy of Finland funded projects

WHY?

  • Ensures that your data can be identified like your articles
  • Maximizes the visibility of your data
  • You can easily include the data to your CV
  • The data will be linked to the related articles in ACRIS, and other available research outputs

Publish

38 of 43

Subtitle: Archive

Archive

39 of 43

Archive

  • Concerns the most significant and valuable research data
  • Data will be archived for a very long period – for decades or centuries
  • The selection should follow guidelines and criteria, e.g. scientific and/or historical value, uniqueness
  • The national service for long term preservation in Finland provided by CSC: https://www.fairdata.fi/en/fairdata-pas/
  • More information: http://www.digitalpreservation.fi/en

Archive

40 of 43

Training available

Themes of upcoming training:

    • Data Management Plans
    • Handling of Personal Data
    • Introduction to Github
    • Sharing research data through a repository
    • Working with Restricted Datasets
    • Data Anonymization
    • Legal Aspects of Research Data
    • How to Store Research Data
    • Making your research/code reproducible and reusable
    • Academic publishing: PlanS and overlay journals

Further information on RDM trainings and events including previous webinars and materials

RDM Guidelines & other materials: https://www.aalto.fi/RDM 

41 of 43

Help �available

Data Agents (researchers who advise on research data management)

  • Raise awareness on RDM
  • Help figure out where to store your data safely
  • Advise on where to publish your data (data repositories)
  • Promote open data and research
  • Help devise Data Management Plans (DMP) for funders such as Finnish Academy

Are available to help on Zoom �Wednesdays 13-14 �Email RDM questions to researchdata@aalto.fi�Data agents, IT experts, Legal Counsels, Information Specialists

42 of 43

References

Images

Slide 1: Jan Antonin Kola at Unsplash

Slide 4 left to right: Markus Spiske at Unsplash, Cara Shelton at Unsplash, Balazs Ketyi at Unsplash, Kobu Agency at Unsplash, Sigmund at Unsplash

Slide 7: Wonderlane at Unsplash

Data reference slide 22-28

Faehnle, Maija & Schulman, Harry & Söderman, Tarja & Kopperoinen, Leena & Hirvensalo, Jenni (2016). ”Kaupunkiseutujen viherrakenteen suunnittelu 2013” . Versio 1.0 (2016-08-18). Yhteiskuntatieteellinen tietoarkisto. http://urn.fi/urn:nbn:fi:fsd:T-FSD3080.

43 of 43

Thank you!