1 of 13

Planning Research

Organizing and Documenting Your Research Project and Data

2 of 13

Learning Objectives

  • Understand the importance of READMEs as a way to document the steps of a research project
  • Create a README for a research projects
  • Set up the folder structure (including filename) for a project
  • Document the decision making and the research process
  • Publish the software / pipelines developed as part of the project
  • Set up a DOI for your software
  • Register a research project (https://osf.io/registries)
  • Find the minimal metadata standards for the data collected

3 of 13

README Activity

Thinking about your own data: What would you (or another researcher) need to know to reuse or reproduce your data?

https://pollev.com/trishaadamus254

Review

READMEs:

Sort the README files and be prepared to describe why you sorted them in this way.

Who do I contact if I have questions about the data?

Do you need specific equipment/software to analyze/understand the data?

When was the research conducted?

What are the units?

Where was the research data collected/created?

What are the variables?

How was the research funded?

How can I reuse this data (licensing)?

4 of 13

Documentation With READMEs

  • Commonly used to document software installation, and can also be used for:
    • Research project workflows
    • Describing datasets
  • Typically a .txt or .md (markdown) file format
    • These file formats can be opened universally

5 of 13

What to Include in a README?

  • Title of research project and/or dataset
  • Names, contact information, and institutional affiliation for those associated with the project
  • Funding sources or institutional support
  • A list of files and folders, a description of their contents, how to use them
    • For each file name, a short list of the data that it contains
    • Date the file was created

6 of 13

What to Include in a README...continued

  • Methodological information
    • Description of methods for data collection or generation
      • Include links or references to publications or other documentation containing experimental design or protocols used
    • Description of methods used for data processing and analysis
  • Limitations of the data or the project
  • Data-specific information (repeat this as much as necessary)
    • Variable names and abbreviations, definitions of column names for tabular data
    • Units of measurement
    • Definitions for how missing data is recorded
  • Sharing and access information
    • Copyright and licensing, restrictions on data, citation

7 of 13

Metadata & Research Data Documentation

  • Documentation for your research project and data should contain the minimum information required to reuse the data that it describes
  • Methods for documenting data:
    • Data dictionary: explain variable names and values, typically for tabular data
    • README: can be used to describe software, datasets, or details of a research project
    • Embedded metadata: often in a machine-readable format (.xml or .json), with a general or discipline-specific standard
    • Data paper: a paper that describes a dataset, usually accompanied by metadata
    • Codebook: provides descriptions and definitions about the variables and values in a dataset, and how variables relate

8 of 13

File Naming

  • When planning a new project, think about establishing organizational conventions within your lab/group
  • Be brief: choose 3-4 key pieces of information about the file to use in a file name

Mendota_Buoy6_20180722_v3

  1. The lake and the buoy that the data was collected from
  2. The date that the data was gathered (written in a standard format)
  3. The version number of the document

9 of 13

File Naming: Other Tips

  • File names that are too long may not work well with certain software
  • Avoid special characters: ! @ # $ % ^ & * ( ) ‘ | { } [ ] < > / ? “ ‘
  • Don’t use spaces
    • Instead, use:
      • Underscores: file_name.xxx
      • Dashes: file-name.xxx
      • No separation: filename.xxx
      • Camel case: FileName.xxx

10 of 13

File Organization

  • A well-organized hierarchical folder structure should align with your file-naming conventions
  • Balancing breadth and depth in creating a hierarchy
    • Limit the number of top-level folders and the number of nested folders
    • Too many nested folders → data is difficult to access
    • Too many files in a folder → data is cluttered and difficult to find

11 of 13

Publishing Your Software / Pipeline

  • Why?
    • Helps ensure the software is citable, preserved, accessible--supports reproducibility, replication, and transparency
    • It’s also increasingly required by funders
  • Where?
    • You can store your code on GitHub--easily to update and share with collaborators while working on your project
    • Archive it somewhere like Zenodo or Figshare
      • These repositories provide a citable DOI (Digital Object Identifier) for more sustained access

12 of 13

Setting Up a DOI

  • DOIs are persistent IDs that you can get for your research materials
    • Datasets, software, papers, presentations, and other research output
  • DOIs help people easily locate research outputs
  • Reputable research repositories typically provide DOIs as part of their service
    • Examples: Zenodo, OSF, Figshare, Dryad

13 of 13

Register Your Research Project

  • Open Science Framework (OSF) provides project registrations
  • A registration is a frozen, time-stamped copy of an OSF project
  • You can register your research project at various points of the process
    • Before you start data collection
    • When you submit a manuscript for review
    • Once the project is complete

https://help.osf.io/hc/en-us/articles/360019930893-Register-Your-Project