1 of 26

Project-oriented Workflows

Or: what I’m still learning about collaboration & reproducibility

Dr. Corey Clatterbuck

Senior Ecologist, California Coastal Commission

IEP Data Science Workgroup

2 of 26

Warning: hardly any original content here

J. Bryan, 2017. Project-oriented workflow. Retrieved from: https://www.tidyverse.org/blog/2017/12/workflow-vs-script/

J. Bryan, J. Hester, S. Pileggi, E. D. Aja. What They Forgot to Teach You About R. Retrieved from: https://rstats.wtf

S. Pileggi, 2023. PIPING HOT DATA: Project Oriented Workflow. Retrieved from: https://www.pipinghotdata.com/talks/2023-09-11-project-oriented-workflows/

Openscapes, 2023. Better Science in Less Time, Openscapes Champions Series. Retrieved from: https://openscapes.github.io/series/core-lessons/better-science.html

J. Bryan, 2022. How to name files. Retrieved from: https://www.youtube.com/watch?v=ES1LTlnpLMk

B. Rodrigues, 2023. Building reproducible analytical pipelines with R. Retrieved from: https://raps-with-r.dev

3 of 26

My motivation…

Technical: Physical and biological scientists are rarely exposed to best computing practices

Social: Better science for future us (yourself, your collaborators, fellow scientists)

Tooling

tools & practices

People

teams & community

Diagram from Openscapes

4 of 26

My motivation…may differ from yours

decide how much

you care about this

Idea from Bryan 2017: Project-oriented workflow

5 of 26

What are project-oriented workflows?

  • Project files & sub-folders live in the same, top-level folder
  • Have multiple projects open at the same time but working in different processes
  • Projects are functionally portable → should work similarly on another computer

6 of 26

What are project-oriented workflows?

  • Project files & sub-folders live in the same, top-level folder
  • Have multiple projects open at the same time but working in different processes
  • Projects are functionally portable → should work similarly on another computer

Figure from Pileggi 2023: Project Oriented Workflows

7 of 26

What are project-oriented workflows?

  • Project files & sub-folders live in the same, top-level folder
  • Have multiple projects open at the same time but working in different processes
  • Projects are functionally portable → should work similarly on another computer

Figure from What They Forgot To Teach You About R, Section 3.5

8 of 26

R Project, swamp-manual.Rproj

9 of 26

R Project

Which of these persists after running rm(list=ls())?

Exercise from Pileggi 2023: Project Oriented Workflows

Option

Persists? (Y/N)

  1. library(dplyr)

B. summary <- head

C. options(stringsAsFactors = FALSE)

D. Sys.setenv(LANGUAGE = “de”)

E. x <- 1:5

F. attach(palmerpenguins)

10 of 26

R Project

Which of these persists after running rm(list=ls())?

Exercise from Pileggi 2023: Project Oriented Workflows

Option

Persists? (Y/N)

  • library(dplyr)

Y

B. summary <- head

N

C. options(stringsAsFactors = FALSE)

Y

D. Sys.setenv(LANGUAGE = “de”)

Y

E. x <- 1:5

N

F. attach(palmerpenguins)

Y

11 of 26

R Project

Forget using rm(list=ls()) at the top of scripts. What you likely want is to Restart R session.

Which of these persists after running rm(list=ls())?

Exercise from Pileggi 2023: Project Oriented Workflows

Option

Persists? (Y/N)

  • library(dplyr)

Y

B. summary <- head

N

C. options(stringsAsFactors = FALSE)

Y

D. Sys.setenv(LANGUAGE = “de”)

Y

E. x <- 1:5

N

F. attach(palmerpenguins)

Y

12 of 26

R Project

Within RStudio, Tools → Global options

Figure from Pileggi 2023: Project Oriented Workflows

13 of 26

What are project-oriented workflows?

  • Project files & sub-folders live in the same, top-level folder
  • Have multiple projects open at the same time but working in different processes
  • Projects are functionally portable → should work similarly on another computer

setwd()

path <- “C:/…”

here()

file name convention

😊

14 of 26

What’s wrong with setting the directory?

setwd() sets a directory structure that is unlikely to exist anywhere outside of your own computer.

R Projects, as we discussed, set up your working directory automatically.

How can we navigate elsewhere?

15 of 26

How does here() work?

Tl;dr: loading library(here) within an R Project automatically sets file paths in the Project relative to where .Rproj lives.

16 of 26

How does here() work?

Together, R Projects and here() make writing relative file paths in your project simple & transportable to anyone’s computer!

17 of 26

Why should I care about file names?

Ever had to:

  • open a file to figure out what’s in it?
  • determine which file is the most recent version?

Good file names may vary in appearance but always improve communication & efficiency.

18 of 26

Principles of file naming (Jenny Bryan)

File names should be:

  • Human readable
  • Machine readable
  • Amenable to default sorting

19 of 26

What makes some file names better than others?

Figure from Bryan 2022: How To Name Files

20 of 26

Human readable file names

Find what you need quickly. Few contextless abbreviations (know your audience)

VS.

21 of 26

Machine readable file names

Do not contain special punctuation, including spaces

deliberate use of delimiters (regex coming up!)

Figure from Bryan 2022: How To Name Files

22 of 26

“Embrace the slug” for human & machine readability

The slug is the end of the URL that communicates to humans what you’re seeing.

Compare:

https://www.youtube.com/watch?v=ES1LTlnpLMk

vs

https://github.com/jennybc/how-to-name-files

23 of 26

Amenable to default sorting

Left-pad numbers

For dates, use ISO8601 (YYYY-MM-DD)

Logical: puts like with like vs Chronological: ordered by date

Figure from Bryan 2022: How To Name Files

24 of 26

Amenable to default sorting

Left-pad numbers

For dates, use ISO8601 (YYYY-MM-DD)

25 of 26

Amenable to default sorting

Left-pad numbers

For dates, use ISO8601 (YYYY-MM-DD)

Logical: puts like with like vs Chronological: ordered by date

26 of 26

Make your path forward sharable & efficient

Figure by Allison Horst, for Openscapes