ESM 206 - Lecture 3
Part 1: Some coding style considerations
Part 2: Tidy data format, tidyr::pivot_*
Part 3: Intro to R Markdown
Part 4: Data structures in R
1
Lab 1 recap:
This is what you’ll do over and over (and over and over) for Assignment 1.
Part 1: Some coding considerations
See: The tidyverse Style Guide by Hadley Wickham
The tidyverse Style Guide by Hadley Wickham
“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.”
Why?
Who is a collaborator?
Some important things:
Code structure
object_name <- df %>% filter(col_a == “yes”, col_b == “burritos”, col_c != “eggplant”) %>% select(col_b:col_e) %>% mutate(new_col = col_f + col_g) %>% group_by(col_b, col_d) %>% summarize(new_col_2 = mean(new_col))
object_name <- df %>%
filter(col_a == “yes”,
col_b == “burritos”,
col_c != “eggplant”) %>%
select(col_b:col_e) %>%
mutate(new_col = col_f + col_g) %>%
group_by(col_b, col_d) %>%
summarize(new_col_2 = mean(new_col))
EW:
PHEW:
Consider pressing ‘Return’:
ggplot(df, aes(x = temp, y = salinity)) +
geom_point(color = “blue”,
size = 2,
pch = 18,
alpha = 0.2)
Code spacing (from the Tidyverse style guide)
Rough: select(data = df, temp , ph ,salinity)
Better: select(data = df, temp, ph, salinity)
Rough: filter ( data = df, temp, ph, salinity )
Better: filter(data = df, temp, ph, salinity)
Rough: pika_mass <- age*4.3+0.2
Better: pika_mass <- age * 4.3 + 0.2
13
What is tidy data?
From R for Data Science by Grolemund & Wickham:
To be “tidy”:
14
A variable is a characteristic that is being measured, counted or described with data. Like: car type, salinity, year, population, or whale mass.
An observation is a single “data point” for which the measure, count or description of one or more variables is recorded. For example, if you are recording variables height, mass, and color of dragons, then each dragon is an observation.
A value is the recorded measure, count or description of a variable.
15
Tidy data schematic, from R for Data Science by Grolemund & Wickham:
16
An example of tidy data:
17
Why isn’t this data “tidy”?
What would it look like if we made it tidy?
What might you call the variables?
18
To make this tidy, we gather it
(i.e. convert from wide to long format)
19
Why isn’t this data frame “tidy”?
Sketch what it would look like if it were tidy.
Question: in what way is this df not tidy? What would it look like if you made it tidy?
Example: wide-to-long
Fictional data frame ‘dogs’: how many miles Teddy & Khora run
dogs_longer <- dogs %>%
pivot_longer(week_1:week_3,
names_to = week,
values_to = miles)
Part 3: Meet RMarkdown
Our workflow so far:
Why that’s a good step forward:
What are some good next steps?
Scripts: Great when…
Maybe not so great if…
“R Markdown files are designed to be used in three ways:
From R4DS:
What is markdown?
Markdown is a tool to add formatting to plain text so that when you ask it to (e.g. when you convert to an HTML), it’s formatted as requested.
Basically: Instead of highlighting & clicking options to update text formatting (like in Word), we add appropriate symbols & syntax to plain text that, when asked, it’s formatted in an HTML (or a PDF, or Word…) document.
With R Markdown, you can have your reproducible code, text, and outputs all in one place.
Less copying from code to report = Less opportunity for error +
Less time +
Fewer files
When working in R Markdown:
OK. Then what happens? When you KNIT (cmd + Shift + K):
.Rmd > (knitr) > .md > (pandoc) > .html
Note: To have your knitted preview show up in the ‘Viewer’ tab within RStudio:
Tools > Global Options > R Markdown > Show output preview in > (select “Viewer Pane”)
Part 4: Common data structures in R
We often read in external data.
We can also make vectors & DFs in R.
Make a vector by combining elements with c():
There are a number of ways to make DFs:
You can use rbind() - “row bind” to bind vectors together in rows, but usually you want to combine vectors as columns. Use data.frame() to bind vectors (of same length) together as columns into a single df:
What in the world is a tibble?
Or make one with tribble()