ESM 206 - Lecture 2
Part 1: Naming objects & entering data
Part 2: R troubleshooting & resources
1
First, some terms:
Part 1: Naming things & entering data
Some resources for basics of data science & coding good practices:
Karl W. Broman & Kara H. Woo (2018) Data Organization in Spreadsheets, The American Statistician, 72:1, 2-10, DOI: 10.1080/00031305.2017.1375989
Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, Teal TK (2017) Good enough practices in scientific computing. PLoS Comput Biol 13(6): e1005510.
Hadley Wickham Advanced R Style Guide (code-specific)
“Call him Voldemort, Harry. Always use the proper name for things.”
“There are only two hard things in Computer Science: cache invalidation, and naming things.”
Naming things
When naming variables, observations, data frames, or files, make them:
Naming things
When naming variables, observations, data frames, or files, make them:
Naming things
When naming variables, observations, data frames, or files, make them:
Naming things
When naming variables, observations, data frames, or files, make them:
Naming things
When naming variables, observations, data frames, or files, make them:
Other naming considerations:
Entering things
We’ll consider three bins for now:
Entering things
Bad:
Better:
Keep a clear record of your naming system -
and plan on forgetting your naming system
Entering dates/times
Part 2: Help you help yourself in R:
Finding resources & troubleshooting tips
Fun fact: “...the very first instance of a computer bug was recorded at 3:45 pm (15:45) on the 9th of September 1947. This "bug" was an actual real-life, well ex-moth, that was extracted from the number 70 relay, Panel F, of the Harvard Mark II Aiken Relay Calculator.” (Christopher McFadden, Interesting Engineering)
If you’re asking: What package or function should I use to do this thing?
TROUBLESHOOTING, A FACT:
How do I know there’s an error, and where to look for it?
Sometimes R tries to give you some hints that things are awry
Error messages will show up* in the Console when you try to run the broken code:
*usually/hopefully
There are multiple types of messages that R will print. Read the message to figure out what it’s trying to tell you.
Error: There’s a fatal error in your code that prevented it from being run through successfully. You need to fix it for the code to run.
Warning: Non-fatal errors (don’t stop the code from running, but this is a potential problem that you should know about).
Message: Here’s some helpful information about the code you just ran (you can hide these if you want to)
When you get an error message in R:
Some common errors/issues to keep an eye out for at the beginning (and forever and ever...)
If R...can’t find a function that you know exists:
Symptom: ‘Error in _________: could not find function “_________”
Likely diagnoses:
Possible solutions:
Symptom: ‘Error in ____ %>% ____ : could not find function "%>%"’
Likely diagnoses:
Possible solutions:
If R...can’t find the pipe operator:
If R...can’t find an object (e.g. an object or variable) that you know you’ve stored:
Symptom: ‘Error in ____ : object ‘_____’ not found’
Likely diagnoses:
Possible solutions:
If R…tells you it’s ignoring an argument within a function
Symptom: ‘Warning: Ignoring unknown parameters: ____’
Possible diagnoses:
Possible solutions:
How to find out what arguments are accepted by which functions:
If you…are trying to make a basic ggplot2 graph and you accidentally use %>% between layers instead of a +
Symptom: ‘Error: `mapping` must be created by `aes()`
Did you use %>% instead of +?’
Diagnosis:
Possible solutions:
If you…think your ggplot code looks perfect and you’re not getting an error message, but only an empty graph is showing up:
Symptom:
Possible diagnoses:
Possible solutions:
(dang)
If you…are trying to change some aesthetic in a ggplot graph, but you’re getting an error:
Symptom: Error in rep(value[[k]], length.out = n) :
attempt to replicate an object of type 'closure'
Possible diagnoses:
Possible solutions:
If you...are trying to find a summary value for a variable that you know contains numbers, but you’re getting an NA result and/or a warning message:
Symptom(s):
Possible diagnoses:
Possible solutions:
Can’t figure out what’s going on from the error message directly? My process:
Can’t figure out what’s going on from the error message directly? My process:
Can’t figure out what’s going on from the error message directly? My process:
Can’t figure out what’s going on from the error message directly? My process:
Don’t forget the flip-side!
Just because you don’t get an error message doesn’t mean that you did things correctly - it just means that the code is running.
So LOOK AT YOUR RAW DATA, INTERMEDIATE DATA AND RESULTS - especially just after reading it in and after wrangling steps - to ensure that what you *think* your code is supposed to be doing with/to your data is *actually* what your code is doing with/to your data.
There are often many solutions that work - try to focus on and use solutions that work and are clear, well-organized, and that use consistent/familiar syntax