1 of 37

Data Management

Camille Andrews

ceandrews01@wm.edu

Summer 2022

https://tinyurl.com/Dmsum22

2 of 37

Objectives

  • Describe key elements of data management and data management plans and why they’re important
  • Identify some good practices for data management, including file naming and creating readme style documentation
  • Start creating a data management plan of your own

3 of 37

4 of 37

Why Manage Data: �Researcher Perspective

  • Manage your data for yourself:
    • Keep yourself organized – be able to find your files (data inputs, analytic scripts, outputs at various stages of the analytic process, etc.)
    • Track your science processes for reproducibility – be able to match up your outputs with exact inputs and transformations that produced them
    • Better control versions of data – easily identify versions that can be periodically purged
    • Quality control your data more efficiently

Why Data Management

5 of 37

Why Data Management: �Researcher Perspective

    • To avoid data loss (e.g. making backups)
    • Format your data for re-use (by yourself or others)
    • Be prepared: Document your data for your own recollection, accountability, and re-use (by yourself or others)
    • Gain credibility and recognition for your science efforts through data sharing!

CC image by UWW ResNet on Flickr

Why Data Management

6 of 37

Activity: In groups of two, examine, this file. Do you think you could work with this data? Why or why not?

7 of 37

A wildlife biologist for a small field office was the in-house GIS expert and provided support for all the staff’s GIS needs. However, the data were stored on her own workstation. When the biologist relocated to another office, no one understood how the data were stored or managed.

Solution: A state office GIS specialist retrieved the workstation and sifted through files trying to salvage relevant data.

Cost: 1 work month ($4,000) plus the value of

data that were not recovered

Consider that the situation could have been worse, because the data were not being backed up as they would have been if stored on a server.

Poor Science Data Management Example

8 of 37

Importance of Data Management

The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work.

Why Data Management

9 of 37

Why Data Management: �Foundation to Advance Science

  • Data is a valuable asset – it is expensive and time consuming to collect
  • Data should be managed to:
    • maximize the effective use and value of data and information assets
    • continually improve the quality including: data accuracy, integrity, integration, timeliness of data capture and presentation, relevance, and usefulness
    • ensure appropriate use of data and information
    • facilitate data sharing
    • ensure sustainability and accessibility in long term for re-use in science

Why Data Management

10 of 37

Data Management Facilitates Sharing and

Re-use…

11 of 37

Also see Compliance with funder mandates on W&M guide

12 of 37

Well managed, publicly accessible data is important: why?

Here are a few reasons (from the UK Data Archive):

  • Increases the impact and visibility of research
  • Promotes innovation and potential new data uses
  • Leads to new collaborations between data users and creators
  • Maximizes transparency and accountability
  • Enables scrutiny of research findings
  • Encourages improvement and validation of research methods
  • Reduces cost of duplicating data collection
  • Provides important resources for education and training

Why Data Management

13 of 37

Well-Managed Data Can Result in �Re-use, Integration, and New Science

Spatio-Temporal Exploratory Models predict the probability of occurrence of bird species across the United States at a 35 km x 35 km grid.

Land Cover

Potential Uses-

  • Examine patterns of migration
  • Infer impacts of climate change
  • Measure patterns of habitat usage
  • Measure population trends

Model results

eBird

Meteorology

MODIS – Remote sensing data

Occurrence of Indigo Bunting (2008)

Jan

Sep

Dec

Jun

Apr

Slide courtesy of DataOne

Why Data Management

14 of 37

“Planet hidden in Hubble archives” Science News �(Feb. 27, 2009)

A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters.

“The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble.

“The second thing it tells you is having a well calibrated archive is necessary but not sufficient to make breakthroughs — it also takes a very innovative group of people to develop very smart extraction routines that can get rid of all the artifacts to reveal the planet hidden under all that telescope and detector structure.”

New Discoveries

D. Lafrenière et al., ApJ Letters

Why Data Management

15 of 37

SangyaPundir, CC BY-SA 4.0, via Wikimedia Commons

16 of 37

What is the Data Life Cycle?

Plan

Collect

Assure

Describe

Preserve

Discover

Integrate

Analyze

Why Data Management

17 of 37

Plan: Create a data management plan (DMP)

18 of 37

Data management plan guide

19 of 37

Cornell University Research Data Management Service Group’s data management planning page and basic elements from DataOne

20 of 37

DMP Tool

21 of 37

Other planning practices

22 of 37

Collect: Preserve a separate copy of your raw data & use non-proprietary formats & be consistent

23 of 37

Other collecting practices

24 of 37

Assure: Develop a Quality Assurance/Control plan

25 of 37

Other assurance practices

26 of 37

Describe: Document and describe your data (metadata)

27 of 37

Use good practices in file management

28 of 37

Create good metadata

  • Where possible use an existing metadata standard for your discipline (see this list from the Digital Curation Center or this community-generated list). When in doubt, talk to the library!
  • At minimum, you can create your own “read me” file: Guide to writing “read me” style metadata from Cornell University

29 of 37

Other description best practices

30 of 37

Preserve: Use the 3-2-1 rule to back up your data (3 copies on 2 media types, at least 1 remote)

31 of 37

Data preservation and sharing

32 of 37

Other preservation practices

33 of 37

Discover, Integrate, Analyze: Documents steps in data processing (create a workflow)

34 of 37

Other practices for discovery, integration & analysis

35 of 37

More Training

36 of 37

Questions?

Contact Camille Andrews with any questions (ceandrews01@wm.edu)

Feedback

37 of 37

The full slide deck may be downloaded from:

http://www.dataone.org/education-modules

Suggested citation:

DataONE Education Module: Data Management. DataONE. Retrieved Nov 16, 2016. From http://www.dataone.org/sites/all/documents/L01_DataManagement.pptx

Copyright license information:

No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE.

Why Data Management