1 of 26

Openscapes Champions Program

CC By Openscapes

Linked from: https://openscapes.org/series

Last updated 2025-05-20

1

Coding strategies for Future Us

2 of 26

Thinking for reuse

  • Often we are (or think we are) only doing our �data work on our own
  • When we want to facilitate collaboration or focus on reproducibility, we need new strategies:
    • Project organization
    • File naming conventions and file path specification
    • Communication methods
  • We’re using R examples for the sake of specificity. Aim is to present ideas & workflows useful for coders and folks who do not (yet?) identify as coders.
  • What they Forgot to Teach You About R - Most of this comes from Bryan & Hester’s awesome course. Jenny Bryan is a hero in the R world. An early adopter teaching R/GitHub as a prof at Univ. British Columbia, now at Posit

2

3 of 26

Source files: scripts & notebooks

3

Saving code is an absolute requirement for reproducibility. (Future you, future us)

Save commands as "scripts" (.R,.py) or “notebooks” (.Rmd, .qmd, .ipynb). It doesn't have to be polished. Just save it!

  • Everything that really matters should be achieved through code that you save
    • Including creating figures
    • Contrast: Series of unrecorded mouse

clicks

  • The process is important, the product is just an outcome
    • Outputs should be treated as disposable
    • Like a recipe: with ingredients & explicit instructions you can recreate it

4 of 26

Name files deliberately

Jenny Bryan’s 3 rules for Naming Things (2015)

  • machine readable (letters, numbers, “_”, and “-”)
  • human readable
  • plays well with default ordering

4

Jenny Bryan “Naming things” video (5 mins) from NormConf · Dec 4, 2022

5 of 26

Organize your work in projects

One folder per project.

Report? R package? Chapter? Website? Whatever.

Can be the same unit as a GitHub Repo.

If using RStudio, it’s Project (capital P) .

  • Project gets its own R instance
  • R starts at the project root working directory:

all paths are relative to the project's folder.

Work on multiple projects at once w/ multiple instances of RStudio (or other software/IDE)

5

Project folder example: github.com/benmarwick/rrtools

6 of 26

Filepath Preamble

  • Every saved thing gets a unique path
  • Your code needs to run from somewhere specific. And when it interacts with other things (data or other code), you need to tell your code where things are
  • Project-oriented workflows�allow relative file paths, which�are portable

6

7 of 26

setwd("path/that/only/works/on/my/machine")

In R, we set the “working directory” file path using the command setwd()

  • The chance of setwd() having the desired effect -- making the file paths work -- for anyone besides its author is 0%.
  • It's also unlikely to work for the author one or two years or computers from now.
  • Hard-wired, absolute paths, especially when sprinkled throughout the code, make a project brittle. Such code does not travel well across time or space.

Instead, be deliberate with file paths...

7

library(ggplot2)

setwd("/Users/jenny/cuddly_broccoli/verbose/foofy/data")

df <- read.csv(“raw_foofy_data.csv”)

p <- ggplot(df, aes(x, y)) + geom_point()

ggsave("/Users/jenny/cuddly_broccoi/ambiguous/fig.png")

8 of 26

Strategies for re-use: use relative file paths

If you’re using setwd() in your scripts, that’s ok, but be very disciplined:

  • Only use it at the very start of a file,

i.e. an obvious & predictable place.

  • Always set working directory to the same thing: top-level of the project. Build subsequent paths relative to this.

8

    • Aside: R users: use the here package

9 of 26

Strategies for re-use: Start R with a blank slate

  • Saving code is an absolute requirement for reproducibility.
  • When you quit, do not save the workspace to an .Rdata file. When you launch, do not reload the workspace from an .Rdata file.
  • In RStudio, set this via Tools > Global Options.

10 of 26

Strategies for re-use: Avoid rm(list = ls())

    • It only deletes user-created objects from the global workspace.
    • It will delete user-created objects of someone trying to help you
    • This is highly suggestive of a non-reproducible workflow.

In R, to start with a clean slate, it's common to see scripts begin with this object-nuking command: rm(list = ls())

  • The problem is that given the intent, it does not go far enough.
  • Instead, Restart R w/ a clean slate OFTEN (e.g. many times/day), and write every script assuming it will be run in a fresh R process

10

11 of 26

Iterating & communicating with “literate programming”

Quarto, Jupyter notebooks combines code + text + output (tables, figures)

Analyses & figures in the same place as reporting document: saves time as you iterate!

Enables good practices for reproducibility & versioning

Simple text formatting

Code – R, Python, SQL, bash, others

12 of 26

Quarto

Quarto’s familiar outputs for science: Word documents and PDFs

Our Quarto file renders to:

Word!

Imagine never copy-pasting a graph into your report again!!!!

Quarto can also manage citations, cross- referencing figures and section headers.

PDF!

13 of 26

Quarto

Quarto creates HTML files that can be shared openly on the web

We can store and distribute html files on GitHub, combined as websites, slides, books

>> Reimagine sharing, sci comm, engagement, inclusion

Our Quarto file renders to:

HTML!

Suddenly you can share a URL rather than attaching a file!

And that same URL will update rather than re-attaching a new version of the file!

14 of 26

Further resources:

Workflows for (data) scientists with Python and R

  • R for Excel Users - Lowndes & Horst
    • R/RStudio workflows with tidyverse, RMarkdown, and GitHub, using ecological data from LTER (update from OHI’s intro to data science)
  • An introduction to Earth and Environmental Data Science - Abernathy
    • Intro to Python, JupyterLab, Unix, Git, some packages & workflows
  • Data analysis and visualization in Python for ecologists - Carpentries
    • Setup recommends using Anaconda and Jupyter Notebooks
  • Python for Data Analysis - Thompson
    • Assumes no previous experience. Also intro to the command line.
  • What they Forgot to Teach You About R - Bryan and Hester

14

15 of 26

15

16 of 26

16

17 of 26

Ideally, you don’t hardwire anything about your workflow into your product.

Workflow versus Product

Distinction between things you do because of personal taste & habits (“workflow”) versus the logic and output that is the essence of your project (“product”).

17

Workflow:

  • Editor you use to write code.
  • Name of your home directory.
  • R code you ran before lunch.

Clearly product:

  • Raw data.
  • R code someone needs to run on your raw data to get your results, including the explicit library() calls to load necessary packages (source files)

From Bryan & Hester, RStudio, What they Forgot to Teach You About R

18 of 26

19 of 26

R Markdown

R Markdown powerfully combines executable R code with simple text formatting and for efficient, automatable, reproducible research

Analyses & figures in the same place as reporting document: saves time as you iterate!

Enables good practices for reproducibility & versioning

Simple text formatting

Code – R, Python, SQL, bash, others

20 of 26

R Markdown

R Markdown’s familiar outputs for science: Word documents and PDFs

Our RMarkdown file renders to:

Word!

Imagine never copy-pasting a graph into your report again!!!!

R Markdown can also manage citations, cross- referencing figures and section headers.

PDF!

21 of 26

R Markdown

R Markdown creates HTML files that can be shared openly on the web

We can store and distribute html files on GitHub, which also offers display options for publishing.

>> Enabled the Ocean Health Index team to reimagine science communication & engagement.

Our RMarkdown file renders to:

HTML!

Suddenly you can share a URL rather than attaching a file!

And that same URL will update rather than re-attaching a new version of the file!

22 of 26

R Markdown

Transformed communication with OHI partners: methods, websites, tutorials

23 of 26

Example: Ocean Health Index

  • We had a project-oriented workflow, with projects like bhi
  • We set the working directory set to a subfolder, e.g. bhi/baltic.
    • But setting relative paths wouldn’t work for our collaborators
  • We set up a convention everyone had to follow: Create a ~/github/ folder in their root directory, save the bhi project there
  • So our script would say at the top:

setwd(“~/github/bhi/baltic”)

  • Then transitioned to here.r-lib.org; here(“baltic”)
  • Also - Assessments in multiple contexts using

tailorable frameworks (Lowndes et al. 2015);

Documentation: ohi-science.org/ohi-global-guide

23

24 of 26

R Markdown Visual Editor

Use the RStudio toolbar to add section headers, figures, tables, footnotes, citations, etc, à la Word, GDocs.

New Python capabilities, incl display of Python objects in the Environment pane

Learn more:

Introducing Visual R Markdown - J.J. Allaire’s blog

rstudio.github.io/visual-markdown-editing

25 of 26

25

By Allison Horst (tweet)

26 of 26

Software considerations for coding

26

Adapted from Tiffany Timbers, UBC Data Science, Intro to Reticulate, 2020

Type of tool needed:

  • Programming language (R, python)
  • Code editor (RStudio IDE, Jupyter)
  • Version control software (git, GitHub/Bitbucket)

Choosing the “best” tool for the job:

  • Reproducible and auditable
  • Accurate
  • Collaborative (and portable)

Opinionated analysis development (Parker 2017) peerj.com/preprints/3210