1 of 15

An introduction to targets for R

R-Ladies Santa Barbara

October 11, 2023

Tracey Mangin

1

we have 15 slides!

2 of 15

Agenda

Resources

  • This presentation (hi!) ~20 minutes
    • reasonable econ style, interrupt with questions (but use your best judgement)
  • Follow along demonstration ~20 minutes
  • Example in groups (or individually, or all together!)
  • There are many resources available online!
  • I pulled information from several sources - thanks! (listed at the end)
  • Check those out!

2

3 of 15

In a perfect world…

3

import inputs

clean and process data

perform analysis

summarize/visualize outputs

we’re done hooray!

4 of 15

In a perfect world…

Reality…

4

import inputs

clean and process data

perform analysis

summarize/visualize outputs

import inputs

clean and process data

perform analysis

summarize/visualize outputs

update inputs

change cleaning

change this a thousand times and add a million more steps

are these right?

next time, i promise we’ll be perfect…

we’re done hooray!

5 of 15

Workflow challenges

  1. Long run times
    1. Rerunning entire pipelines to ensure that items are up to date can take a lot of time��
  2. Reproducibility

5

6 of 15

Enter targets package for R

  • Pipeline tool specifically for R
    • Pipeline tools coordinate the pieces of analysis projects (e.g., Make)
  • Keeps track of entire workflow
  • Automatically detects when files or functions change
  • Saves time by only running steps, or targets, that are no longer up to date
  • Ensures that the pipeline is run in the correct order
  • Ensures reproducibility: When targets are up to date, this is evidence that the outputs match the code and inputs
  • More trustworthy and reproducible results
  • Note: targets replaces the R tool drake

6

7 of 15

target explained

  • Each step in the pipeline = a target
  • Looks and feels like a variable (name)
    • Stores the returned R object in your project folder (not in the environment)
  • Function oriented
  • Target usually creates, analyzes, or summarizes a dataset/analysis
  • Good targets:
    • Large enough to save time when not run
    • Small enough that some are skipped
    • Don’t modify global environment
    • Return a single “value” or R object

7

function

clean_data <- function(file) {

data <- read_csv(file) %>%

filter(!is.na(date_time)) %>%

mutate(day = weekdays(date_time)

data

}

target

tar_target(name = data, command = clean_data(file))

other examples

tar_target(name = max_val, command = 16))

tar_target(name = save_data, command = simple_write(data.csv), format “file”)

R code to run target

target name

8 of 15

targets setup

  1. Create a project or repo
  2. Create a folder called R
    1. Save script(s) containing functions for analysis in R folder (these scripts are sourced)
  3. Add input files if storing in project
  4. Install targets (once): install.packages("targets")
  5. Run use_targets()
    • Creates required _targets.R file that runs pipeline
  6. Set options (e.g., libraries)
  7. Fill in list() with targets

8

9 of 15

Inspect pipeline once targets filled in

9

functions

tar_manifest()helps check for obvious errors and produces a data frame of info about the targets in the pipeline

tar_visnetwork()visualizes pipeline workflow

tar_glimpse()visualizes pipeline workflow faster than visnetwork, but doesn’t account for progress info

  • to see functions, set targets_only = FALSE

tar_visnetwork()

tar_glimpse()

10 of 15

Run the pipeline

  • tar_make() runs the pipeline
    • Runs targets in the correct order
    • Only runs targets that are out of date (time saver!)

  • Creates folder called _targets in project
    • Outputs saved in _targets/objects
    • tar_read() prints the output
    • tar_load() loads the output in the environment

10

11 of 15

Making changes and rerunning

  • Reruns: targets identifies which parts of pipeline are outdated and only reruns those… a real time saver!
    • tar_outdated() returns names of outdated targets

    • tar_visnetwork() visualizes pipeline and shows out of date targets

11

tar_visnetwork()

12 of 15

Debugging

  • Different because not interactive
  • Layers that make targets good for reproducibility and scaling make it harder to debug
  • If you have an error, run tar_make() will return error message
  • Can run working parts if you set tar_option_set(error = “null”), which will return NULL for errored targets
    • Note: outputs not up to date or correct, but this allows you to look at the outputs!

12

13 of 15

Debugging steps

  • _targets/meta/meta metadata stores most recent error
    • tar_meta() can retrieve error messages
  • Look at functions
    • Most errors are in user defined functions
  • Pause pipeline with browser()
  • Personal approach: step through functions as you normally would
  • tar_destroy() removes that _targets folder - not best practice though!

13

14 of 15

More advanced info and resources

  • _targets.R setup has code for parallel processing
    • If running on a cluster, use_targets() would have detected this and set up _targets.R for parallel processing

CHECK THESE OUT!

  1. The {targets} R package user manual
  2. Get started with {targets} in 4 minutes
  3. https://docs.ropensci.org/targets/
  4. Will Landau - Reproducible Computation at Scale in R with Targets [Remote]
  5. Reproducible Computation at Scale in R with {targets}
  6. FULL TUTORIAL: Build a Full Production Forecasting Workflow in R with Targets & Modeltime
    1. Includes parallelization (see table of contents)
  7. R {targets}: How to Make Reproducible Pipelines for Data Science and Machine Learning - Machine Learning, R programming

14

15 of 15

Demo

  • Data: listen history from https://www.last.fm/ (~2.5 weeks)
  • Workflow
    • Load input file
    • Clean input
    • Summarize data
    • Create visualizations

15