Project-oriented Workflows
Or: what I’m still learning about collaboration & reproducibility
Dr. Corey Clatterbuck
Senior Ecologist, California Coastal Commission
IEP Data Science Workgroup
Warning: hardly any original content here
J. Bryan, 2017. Project-oriented workflow. Retrieved from: https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
J. Bryan, J. Hester, S. Pileggi, E. D. Aja. What They Forgot to Teach You About R. Retrieved from: https://rstats.wtf
S. Pileggi, 2023. PIPING HOT DATA: Project Oriented Workflow. Retrieved from: https://www.pipinghotdata.com/talks/2023-09-11-project-oriented-workflows/
Openscapes, 2023. Better Science in Less Time, Openscapes Champions Series. Retrieved from: https://openscapes.github.io/series/core-lessons/better-science.html
J. Bryan, 2022. How to name files. Retrieved from: https://www.youtube.com/watch?v=ES1LTlnpLMk
B. Rodrigues, 2023. Building reproducible analytical pipelines with R. Retrieved from: https://raps-with-r.dev
My motivation…
Technical: Physical and biological scientists are rarely exposed to best computing practices
Social: Better science for future us (yourself, your collaborators, fellow scientists)
Tooling
tools & practices
People
teams & community
Diagram from Openscapes
My motivation…may differ from yours
decide how much
you care about this
Idea from Bryan 2017: Project-oriented workflow
What are project-oriented workflows?
What are project-oriented workflows?
Figure from Pileggi 2023: Project Oriented Workflows
What are project-oriented workflows?
Figure from What They Forgot To Teach You About R, Section 3.5
R Project, swamp-manual.Rproj
R Project
Which of these persists after running rm(list=ls())?
Exercise from Pileggi 2023: Project Oriented Workflows
Option | Persists? (Y/N) |
| |
B. summary <- head | |
C. options(stringsAsFactors = FALSE) | |
D. Sys.setenv(LANGUAGE = “de”) | |
E. x <- 1:5 | |
F. attach(palmerpenguins) | |
R Project
Which of these persists after running rm(list=ls())?
Exercise from Pileggi 2023: Project Oriented Workflows
Option | Persists? (Y/N) |
| Y |
B. summary <- head | N |
C. options(stringsAsFactors = FALSE) | Y |
D. Sys.setenv(LANGUAGE = “de”) | Y |
E. x <- 1:5 | N |
F. attach(palmerpenguins) | Y |
R Project
Forget using rm(list=ls()) at the top of scripts. What you likely want is to Restart R session.
Which of these persists after running rm(list=ls())?
Exercise from Pileggi 2023: Project Oriented Workflows
Option | Persists? (Y/N) |
| Y |
B. summary <- head | N |
C. options(stringsAsFactors = FALSE) | Y |
D. Sys.setenv(LANGUAGE = “de”) | Y |
E. x <- 1:5 | N |
F. attach(palmerpenguins) | Y |
R Project
Within RStudio, Tools → Global options
Figure from Pileggi 2023: Project Oriented Workflows
What are project-oriented workflows?
setwd()
path <- “C:/…”
here()
file name convention
😊
What’s wrong with setting the directory?
setwd() sets a directory structure that is unlikely to exist anywhere outside of your own computer.
R Projects, as we discussed, set up your working directory automatically.
How can we navigate elsewhere?
How does here() work?
Tl;dr: loading library(here) within an R Project automatically sets file paths in the Project relative to where .Rproj lives.
How does here() work?
Together, R Projects and here() make writing relative file paths in your project simple & transportable to anyone’s computer!
Why should I care about file names?
Ever had to:
Good file names may vary in appearance but always improve communication & efficiency.
Principles of file naming (Jenny Bryan)
File names should be:
What makes some file names better than others?
Figure from Bryan 2022: How To Name Files
Human readable file names
Find what you need quickly. Few contextless abbreviations (know your audience)
VS.
Machine readable file names
Do not contain special punctuation, including spaces
deliberate use of delimiters (regex coming up!)
Figure from Bryan 2022: How To Name Files
“Embrace the slug” for human & machine readability
The slug is the end of the URL that communicates to humans what you’re seeing.
Compare:
https://www.youtube.com/watch?v=ES1LTlnpLMk
vs
https://github.com/jennybc/how-to-name-files
Amenable to default sorting
Left-pad numbers
For dates, use ISO8601 (YYYY-MM-DD)
Logical: puts like with like vs Chronological: ordered by date
Figure from Bryan 2022: How To Name Files
Amenable to default sorting
Left-pad numbers
For dates, use ISO8601 (YYYY-MM-DD)
Amenable to default sorting
Left-pad numbers
For dates, use ISO8601 (YYYY-MM-DD)
Logical: puts like with like vs Chronological: ordered by date
Make your path forward sharable & efficient
Figure by Allison Horst, for Openscapes