1 of 36

ESM 206 - Lecture 3

Part 1: Some coding style considerations

Part 2: Tidy data format, tidyr::pivot_*

Part 3: Intro to R Markdown

Part 4: Data structures in R

1

2 of 36

Lab 1 recap:

  1. Make an R Project
  2. Drop data file into it
  3. Create new R Markdown document
  4. Read in data (read_csv)
  5. Do some basic wrangling
  6. Make a plot
  7. Knit
  8. Save your .Rmd

This is what you’ll do over and over (and over and over) for Assignment 1.

3 of 36

Part 1: Some coding considerations

See: The tidyverse Style Guide by Hadley Wickham

4 of 36

The tidyverse Style Guide by Hadley Wickham

“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.”

5 of 36

Why?

  • Consistent syntax and strict organization will help you and your collaborators work with, update, and provide feedback on your code.
  • Consistency is also useful when troubleshooting - if something looks weird, you’ll start spotting it more quickly

Who is a collaborator?

  • Your future self (Lowndes et al. 2017)
  • Current collaborators/coworkers
  • Future collaborators

6 of 36

  • You will develop your own style of coding. But there are some standards that will make it easier for you & everyone else to follow what you did.

  • Some of these things are true for coding in general, & some are specific to tidyverse-style functional programming, some are things that I find useful to keep my own code organized & easy to troubleshoot

7 of 36

Some important things:

  • Functionality (...it has to work correctly)
  • Reproducibility (.Rproj, {here}, scripts, R Markdown)
  • Organization (subsections, line spacing)
  • Annotation (clear, useful comments)
  • Consistency (spacing, syntax, naming)
  • Elegance (%>%, {purrr}, automating repeated processes, etc.)

8 of 36

Code structure

  • When possible, avoid lines of code > 80 characters long

  • For extended code, consider vertical structure instead of paragraph style

9 of 36

object_name <- df %>% filter(col_a == “yes”, col_b == “burritos”, col_c != “eggplant”) %>% select(col_b:col_e) %>% mutate(new_col = col_f + col_g) %>% group_by(col_b, col_d) %>% summarize(new_col_2 = mean(new_col))

object_name <- df %>%

filter(col_a == “yes”,

col_b == “burritos”,

col_c != “eggplant”) %>%

select(col_b:col_e) %>%

mutate(new_col = col_f + col_g) %>%

group_by(col_b, col_d) %>%

summarize(new_col_2 = mean(new_col))

EW:

PHEW:

10 of 36

Consider pressing ‘Return’:

  • After any pipe operator %>%

  • For functions containing a lot of arguments, after the comma following each argument

ggplot(df, aes(x = temp, y = salinity)) +

geom_point(color = “blue”,

size = 2,

pch = 18,

alpha = 0.2)

  • After any + sign when creating a graph with ggplot2

11 of 36

Code spacing (from the Tidyverse style guide)

  • “Always put a space after a comma, never before”

Rough: select(data = df, temp , ph ,salinity)

Better: select(data = df, temp, ph, salinity)

  • “Don’t put spaces inside or outside parentheses for regular function calls.”

Rough: filter ( data = df, temp, ph, salinity )

Better: filter(data = df, temp, ph, salinity)

  • “Most infix operators (==, +, -, <-, etc.) should be surrounded by spaces.”

Rough: pika_mass <- age*4.3+0.2

Better: pika_mass <- age * 4.3 + 0.2

12 of 36

Part 2: Tidy data

Tidy Data by Hadley Wickham

13 of 36

13

What is tidy data?

From R for Data Science by Grolemund & Wickham:

To be “tidy”:

  1. Each variable is a column.
  2. Each observation is a row.
  3. Each value in its own cell.

14 of 36

14

A variable is a characteristic that is being measured, counted or described with data. Like: car type, salinity, year, population, or whale mass.

An observation is a single “data point” for which the measure, count or description of one or more variables is recorded. For example, if you are recording variables height, mass, and color of dragons, then each dragon is an observation.

A value is the recorded measure, count or description of a variable.

15 of 36

15

Tidy data schematic, from R for Data Science by Grolemund & Wickham:

16 of 36

16

An example of tidy data:

17 of 36

17

Why isn’t this data “tidy”?

What would it look like if we made it tidy?

What might you call the variables?

18 of 36

18

To make this tidy, we gather it

(i.e. convert from wide to long format)

19 of 36

19

Why isn’t this data frame “tidy”?

Sketch what it would look like if it were tidy.

20 of 36

Question: in what way is this df not tidy? What would it look like if you made it tidy?

Example: wide-to-long

Fictional data frame ‘dogs’: how many miles Teddy & Khora run

21 of 36

dogs_longer <- dogs %>%

pivot_longer(week_1:week_3,

names_to = week,

values_to = miles)

22 of 36

Part 3: Meet RMarkdown

23 of 36

Our workflow so far:

24 of 36

Why that’s a good step forward:

  • Relative file paths (to top-level working directory)
  • Self-contained, moveable project folder
  • Complete, organized, reproducible & easy-to-follow code

What are some good next steps?

  • Preparing documents for sharing
  • Storing & saving visualizations - still reproducibly

25 of 36

Scripts: Great when…

  • Doing smallish, isolated coding parts of a big project
  • Extensive description/text isn’t required for comments
  • You’re creating a function/tool that will be used for different projects or by different people
  • Alison Hill:“[my] scripts are short and focused, and named according to the specific thing they do so that I can troubleshoot more easily when something goes wrong.”

Maybe not so great if…

  • You want to have text, code & outputs in one place
  • You need to include formatted text and/or equations
  • Code explanations are extensive
  • You plan to publish/share

26 of 36

“R Markdown files are designed to be used in three ways:

  1. For communicating to decision makers, who want to focus on the conclusions, not the code behind the analysis.

  • For collaborating with other data scientists (including future you!), who are interested in both your conclusions, and how you reached them ( i.e. the code).

  • As an environment in which to do data science, as a modern day lab notebook where you can capture not only what you did, but also what you were thinking.”

From R4DS:

27 of 36

What is markdown?

Markdown is a tool to add formatting to plain text so that when you ask it to (e.g. when you convert to an HTML), it’s formatted as requested.

Basically: Instead of highlighting & clicking options to update text formatting (like in Word), we add appropriate symbols & syntax to plain text that, when asked, it’s formatted in an HTML (or a PDF, or Word…) document.

28 of 36

With R Markdown, you can have your reproducible code, text, and outputs all in one place.

Less copying from code to report = Less opportunity for error +

Less time +

Fewer files

29 of 36

When working in R Markdown:

  • Add text like we’re in a regular text editor, + formatting syntax
  • Put all code in an R code chunks (work in code chunks like you’re in a script)

30 of 36

OK. Then what happens? When you KNIT (cmd + Shift + K):

31 of 36

.Rmd > (knitr) > .md > (pandoc) > .html

Note: To have your knitted preview show up in the ‘Viewer’ tab within RStudio:

Tools > Global Options > R Markdown > Show output preview in > (select “Viewer Pane”)

32 of 36

Part 4: Common data structures in R

  • Vector: a combined list of elements
  • List: a combination of vectors (class of vectors can differ)
  • Data frame: a list of vectors w/ same # elements

33 of 36

We often read in external data.

We can also make vectors & DFs in R.

Make a vector by combining elements with c():

34 of 36

There are a number of ways to make DFs:

You can use rbind() - “row bind” to bind vectors together in rows, but usually you want to combine vectors as columns. Use data.frame() to bind vectors (of same length) together as columns into a single df:

35 of 36

What in the world is a tibble?

  1. Tibbles are data frames
  2. Usually used interchangeably
  3. Some updated functionality
  4. Mostly don’t worry about it
  5. Coerce w/ as_tibble() ...

36 of 36

Or make one with tribble()