1 of 15

An introduction to targets for R

R-Ladies Santa Barbara

October 11, 2023

Tracey Mangin

1

we have 15 slides!

2 of 15

Agenda

Resources

This presentation (hi!) ~20 minutes

reasonable econ style, interrupt with questions (but use your best judgement)

Follow along demonstration ~20 minutes
Example in groups (or individually, or all together!)

There are many resources available online!
I pulled information from several sources - thanks! (listed at the end)
Check those out!

2

3 of 15

In a perfect world…

3

import inputs

clean and process data

perform analysis

summarize/visualize outputs

we’re done hooray!

4 of 15

In a perfect world…

Reality…

4

import inputs

clean and process data

perform analysis

summarize/visualize outputs

import inputs

clean and process data

perform analysis

summarize/visualize outputs

update inputs

change cleaning

change this a thousand times and add a million more steps

are these right?

next time, i promise we’ll be perfect…

we’re done hooray!

5 of 15

Workflow challenges

Long run times

Rerunning entire pipelines to ensure that items are up to date can take a lot of time��

Reproducibility

spent wayyy too much time making this

5

6 of 15

Enter targets package for R

Pipeline tool specifically for R

Pipeline tools coordinate the pieces of analysis projects (e.g., Make)

Keeps track of entire workflow
Automatically detects when files or functions change
Saves time by only running steps, or targets, that are no longer up to date
Ensures that the pipeline is run in the correct order
Ensures reproducibility: When targets are up to date, this is evidence that the outputs match the code and inputs
More trustworthy and reproducible results
Note: targets replaces the R tool drake

6

7 of 15

target explained

Each step in the pipeline = a target
Looks and feels like a variable (name)

Stores the returned R object in your project folder (not in the environment)

Function oriented
Target usually creates, analyzes, or summarizes a dataset/analysis
Good targets:

Large enough to save time when not run
Small enough that some are skipped
Don’t modify global environment
Return a single “value” or R object

7

function

clean_data <- function(file) {

data <- read_csv(file) %>%

filter(!is.na(date_time)) %>%

mutate(day = weekdays(date_time)

data

}

target

tar_target(name = data, command = clean_data(file))

other examples

tar_target(name = max_val, command = 16))

tar_target(name = save_data, command = simple_write(data.csv), format “file”)

R code to run target

target name

8 of 15

targets setup

Create a project or repo
Create a folder called R

Save script(s) containing functions for analysis in R folder (these scripts are sourced)

Add input files if storing in project
Install targets (once): install.packages("targets")
Run use_targets()

Creates required _targets.R file that runs pipeline

Set options (e.g., libraries)
Fill in list() with targets

8

9 of 15

Inspect pipeline once targets filled in

9

functions

tar_manifest()helps check for obvious errors and produces a data frame of info about the targets in the pipeline

tar_visnetwork()visualizes pipeline workflow

tar_glimpse()visualizes pipeline workflow faster than visnetwork, but doesn’t account for progress info

to see functions, set targets_only = FALSE

tar_visnetwork()

tar_glimpse()

10 of 15

Run the pipeline

tar_make() runs the pipeline

Runs targets in the correct order
Only runs targets that are out of date (time saver!)

Creates folder called _targets in project

Outputs saved in _targets/objects
tar_read() prints the output
tar_load() loads the output in the environment

10

11 of 15

Making changes and rerunning

Reruns: targets identifies which parts of pipeline are outdated and only reruns those… a real time saver!

tar_outdated() returns names of outdated targets

tar_visnetwork() visualizes pipeline and shows out of date targets

11

tar_visnetwork()

12 of 15

Debugging

Different because not interactive
Layers that make targets good for reproducibility and scaling make it harder to debug
If you have an error, run tar_make() will return error message
Can run working parts if you set tar_option_set(error = “null”), which will return NULL for errored targets

Note: outputs not up to date or correct, but this allows you to look at the outputs!

12

13 of 15

Debugging steps

_targets/meta/meta metadata stores most recent error

tar_meta() can retrieve error messages

Look at functions

Most errors are in user defined functions

Pause pipeline with browser()
Personal approach: step through functions as you normally would
tar_destroy() removes that _targets folder - not best practice though!

13

14 of 15

More advanced info and resources

_targets.R setup has code for parallel processing

If running on a cluster, use_targets() would have detected this and set up _targets.R for parallel processing

CHECK THESE OUT!

Includes parallelization (see table of contents)

R {targets}: How to Make Reproducible Pipelines for Data Science and Machine Learning - Machine Learning, R programming

14

15 of 15

Demo

Data: listen history from https://www.last.fm/ (~2.5 weeks)
Workflow

Load input file
Clean input
Summarize data
Create visualizations

15