1 of 33

BASS Workshop

9 February 2023

Michael Oellermann

Publishing Research Data with PANGAEA

www.nfdi4biodiversity.org

@NFDI4Biodiv

#NFDI4Biodiv

2 of 33

Who is PANGAEA?

2

3 of 33

Learning goals

  1. Why is it important to share and publish research data
  2. How to prepare data for publication
  3. How to publish and cite data

After the session you will know:

3

https://pixabay.com/de/vectors/m%c3%a4dchen-b%c3%bccher-stapel-lesen-160172/

4 of 33

Why is it important to publish data?

5 of 33

Why publish data?

5

https://pixabay.com/de/vectors/mitarbeiter-schreibtisch-betonen-6038877/

6 of 33

Why publish data?

  • Preserving scientific knowledge
  • Improves access to research data for people outside academia

Greater transparency strengthens public trust in science

  • Benefits the public purse by reducing the funding for repeat work

Benefits for society:

6

https://pixabay.com/de/photos/marmelade-orange-limette-zitrone-2497654/

7 of 33

Why publish data?

  • Improves access for reuse and reinterpretation of your data
  • Avoids/reduces duplication of scientific experiments → saves money and effort
  • Improves replication of experiments

Improves the robustness and correctness of your results

Benefits for the research community:

7

https://pixabay.com/vectors/checklist-collaboration-characters-150938/

8 of 33

Publishing Data - Why publish data?

  • Enhances your visibility and discoverability in the scientific community

more citations of your research articles (up to 25%) [2][3][4]

→ your data collect citations too

→ greater recognition and reputation

→ more collaborations

→ increased (interdisciplinary) audience

Benefits for the authors:

8

9 of 33

Publishing Data - Why publish data?

  • Improves the transparency and credibility of your research
  • Increases your overall number of publications
  • Ensures that you AND data collectors receive credit

Benefits for the authors:

9

10 of 33

Where to publish data?

10

Source:https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/hostedimages/1492246841i/22497226._SX540_.jpg

11 of 33

What are the benefits of data repositories?

  • Can archive your data for long term
  • Assure access and findability of your data
  • Provide data about data (=metadata)
  • Allocation of persistent identifier
  • Support standardised data archiving (data exchange)
  • Are machine readable

11

12 of 33

Large choice of data repositories

12

13 of 33

Why choosing PANGAEA?

  • Certified data repository
  • Established data repository (>20 years)
  • High quality data archiving standards
  • Enhanced data curation
  • Extensive user support
  • Linked to many other services
  • Specialised on earth and environmental data

13

14 of 33

Why choosing PANGAEA?

14

15 of 33

How to publish at PANGAEA - What file formats?

  • Tabular data e.g. TAB-delimited UTF-8 (txt, csv, xlsx)
  • Supplementary files (pdf, ODF, xlsx)
  • Images (tiff, jpeg, png)
  • Video (mp3, mpeg2, mp4)
  • Audio (mp3, wav)
  • Array data (NetCDF)
  • Zip (iso) files
  • No code and models

15

16 of 33

How to publish at PANGAEA - What metadata?

Events

  • station, sampling location/event
  • measurements along a transect
  • origin of organisms
  • field experiment
  • multiple investigations
  • Event label, Longitude/Latitude, Date/Time, elevation, area, comment, devices

16

17 of 33

How to publish at PANGAEA - Collect metadata

More metadata

  • Expedition/cruise/campaign
  • Author(s) of data set + ORCID!!
  • Individual title + abstract for EACH dataset
  • Reference(s) (supplement to, or further details)
  • Method for each parameter
  • Keyword(s)

PANGAEA data templates

17

https://pixabay.com/de/illustrations/eisberg-wasser-blau-ozean-eis-1421411/

18 of 33

How to publish at PANGAEA - Use templates

18

Prepare and share standardized templates before data collection! (example template)

19 of 33

?

19

Source:https://i0.wp.com/benjaminblonder.org/wp-content/uploads/2020/07/IMG_1535.jpg?resize=1536%2C1152&ssl=1

20 of 33

Data

Cleaning

20

https://pixabay.com/de/illustrations/s%c3%a4ubern-fegen-aufkehren-1013734/

21 of 33

Why should you curate your data?

  • Assures high quality of your data
  • Improves handling and understanding of your data → higher re-use
  • Your one-time effort, saves endless hours of repeated curation by others

→ Creator is the best Curator

  • To ‘emancipate’ data from the authors (‘stand on their own feet’)

21

https://pxhere.com/en/photo/1088522

22 of 33

6. Curate data - Common data issues

  • Table descriptions, markings, plots, statistics, empty rows/columns, additional comments in data table
  • Wrong data types (e.g. float, int, object)
  • Wrong date/time format & UTC vs local time
  • Wrong format of latitude/longitude
  • Ambiguous NAs (e.g. nan, N/A, -999.99)
  • Abbreviations
  • Excessive decimal points ≠ sensor accuracy
  • Wrong decimal separator is ‘,’ instead of ‘.’
  • Spelling, e.g. in species names
  • Leading/trailing/double white spaces

22

23 of 33

Curate your data - Common data issues

23

Source: https://figshare.com/

24 of 33

Curate your data - Common data issues

24

Source: https://figshare.com/

25 of 33

Curate your data - Common data issues

25

Source: https://figshare.com/

26 of 33

Curate your data - Common data issues

26

27 of 33

Curate your data - Peer-review your data

27

28 of 33

6. Data curation

28

  • Create a data curation checklist (in Python, R, Excel etc.) that suits your data and target repository

→ Python Data Curation Checklist example at PANGAEA GitHub

29 of 33

How to submit your data?

29

30 of 33

Data submission workflow

Submission

User

Publication + DOI

Data curation

?

30

31 of 33

How to submit your data?

31

32 of 33

Not quite done yet …. Cite your data!

In text citation

32

Cite in full in the reference list of your paper!

Each dataset as own reference

33 of 33

33

https://pxhere.com/en/photo/484552