SORTEE
Library of Code Mistakes
Collection of code mistakes SORTEE members shared with each other to (1) normalise coding errors and (2) build a resource of (common) code mistakes that you can use during code review
How to contribute?
Noticed a coding mistake you or someone else made? Add them to this library to build a resource of (common) code mistakes that people in Ecology & Evolution can use during code review.
Steps:
1. Add a slide and describe the mistake / copy-paste a code chunk (screenshot or text) that contains the mistake.
2. Add a descriptive title and describe how to recognize the mistake.
3. Try to categorize the mistake: (a) drag it under one of the 4R’s of Code review (Reported, Run, Reliable, Reproducible); and (b) identify if the error is conceptual (e.g., implementing the wrong function for a given task), programmatic (e.g., indexing the wrong column of a data frame), or syntactic (e.g., the incorrect spelling of a statement or function).
NB: The mistake reports can be anonymous! You don’t have to put your name, and you can alter the code a bit to make it even more anonymous if you’d like (e.g. replace certain variable names for more generic names)
Is the code as Reported?
Using a different multiple testing correction than reported
Consequence: 14,000 significant p-values (instead of 0 significant p-values!!), while this huge number of significant p-values was not apparent when comparing the p-value distributions of the observed and empirical results.
Type of error: conceptual
Mistake: reported using empirical FDR (empFDR, see screenshot) procedure while in reality was using a positive false discovery rate (pFDR). Did not know the difference*, and only became aware after code review by colleague!
*Difference: empFDR compares the total number of observed p-values vs. empirical p-values (across loci) divided by the number of bootstraps. The pFDR compares each observed p-value to the empirical p-values for each locus separately.
Does the code Run?
Is the code Reliable?
Indexing the wrong column of a dataframe
Mistake: wanted to have more meaningful treatment names than as recorded in the database where data came from, but ended up misassigning treatment names (swapped two treatments)
Consequence: results were swapped between treatment groups and thus interpretation completely off!!
Type of error: programmatic
Wrongly use of mean() function
Mistake: wanted to calculate mean of 2 values but forgot to put them in a vector using c()
Example: > mean(2,4) � [1] 2
> mean(c(2,4)) � [1] 3
Consequence: mean was calculated based on first value only, second one was ignored
Type of error: programmatic
Mistakes with matrix algebra in R
What I expected
What I got
Trying to implement code to replicate analyses from Sztepanacz & Houle 2019 and compare response to selection in males and females. Made mistakes in converting equations into R. The specific code errors are now lost to time… like tears… in the rain (i.e. I did not version control this one, sorry). Link to code repo here.
Unexpected result of as.Date()
Mistake: Used as.Date() to convert date/time vector to dates in timezone other than UTC
Example:
> t <- as.POSIXct(c("2023-01-02 13:00:00", "2023-01-03 22:00:00"))
> t
[1] "2023-01-02 13:00:00 CST" "2023-01-03 22:00:00 CST"
> as.Date(t)
[1] "2023-01-02" "2023-01-04"
Consequence: as.Date() converts to UTC before taking the date. Because the dates are in timezone CST (Central North America), late times actually are converted to the following day. This is why the second date “2023-01-03” is converted to “2023-01-04” which is generally unexpected.
Solution: Use as_date() from the lubridate package
Type of error: programmatic
Are the results the code produces Reproducible?
Not documenting software versions…
Mistake: did not document R package versions during analysis. Many months later when submitting the manuscript - when rerunning the code one more time before data/code deposition - noticed posthoc test p-values had changed.
Consequence: newer R package version employed an updated method for multiple testing correction, leading to different p-values than the ones reported in the manuscript
Type of error: conceptual?
Unreliable behaviour of time/date conversion function
Mistake: as.POSIXct function in R. Depending on the input and the set timezone it converts the time back to GMT, but it showed unpredictable behaviour. It sometimes converted the time that was in CET (GMT+2h) to GMT+4h instead or the other way around (GMT to GMT+2h).
Consequence: results were off by +2 hours
→Double check the timestamp column after every conversion step
Type of error: programmatic