1 of 16

SORTEE

Library of Code Mistakes

Collection of code mistakes SORTEE members shared with each other to (1) normalise coding errors and (2) build a resource of (common) code mistakes that you can use during code review

2 of 16

How to contribute?

Noticed a coding mistake you or someone else made? Add them to this library to build a resource of (common) code mistakes that people in Ecology & Evolution can use during code review.

Steps:

1. Add a slide and describe the mistake / copy-paste a code chunk (screenshot or text) that contains the mistake.

2. Add a descriptive title and describe how to recognize the mistake.

3. Try to categorize the mistake: (a) drag it under one of the 4R’s of Code review (Reported, Run, Reliable, Reproducible); and (b) identify if the error is conceptual (e.g., implementing the wrong function for a given task), programmatic (e.g., indexing the wrong column of a data frame), or syntactic (e.g., the incorrect spelling of a statement or function).

NB: The mistake reports can be anonymous! You don’t have to put your name, and you can alter the code a bit to make it even more anonymous if you’d like (e.g. replace certain variable names for more generic names)

3 of 16

Is the code as Reported?

4 of 16

Using a different multiple testing correction than reported

Consequence: 14,000 significant p-values (instead of 0 significant p-values!!), while this huge number of significant p-values was not apparent when comparing the p-value distributions of the observed and empirical results.

Type of error: conceptual

Mistake: reported using empirical FDR (empFDR, see screenshot) procedure while in reality was using a positive false discovery rate (pFDR). Did not know the difference*, and only became aware after code review by colleague!

*Difference: empFDR compares the total number of observed p-values vs. empirical p-values (across loci) divided by the number of bootstraps. The pFDR compares each observed p-value to the empirical p-values for each locus separately.

5 of 16

Does the code Run?

6 of 16

7 of 16

Is the code Reliable?

8 of 16

Indexing the wrong column of a dataframe

Mistake: wanted to have more meaningful treatment names than as recorded in the database where data came from, but ended up misassigning treatment names (swapped two treatments)

Consequence: results were swapped between treatment groups and thus interpretation completely off!!

Type of error: programmatic

9 of 16

Wrongly use of mean() function

Mistake: wanted to calculate mean of 2 values but forgot to put them in a vector using c()

Example: > mean(2,4) � [1] 2

> mean(c(2,4)) � [1] 3

Consequence: mean was calculated based on first value only, second one was ignored

Type of error: programmatic

10 of 16

Mistakes with matrix algebra in R

What I expected

What I got

Trying to implement code to replicate analyses from Sztepanacz & Houle 2019 and compare response to selection in males and females. Made mistakes in converting equations into R. The specific code errors are now lost to time… like tears… in the rain (i.e. I did not version control this one, sorry). Link to code repo here.

11 of 16

Unexpected result of as.Date()

Mistake: Used as.Date() to convert date/time vector to dates in timezone other than UTC

Example:

> t <- as.POSIXct(c("2023-01-02 13:00:00", "2023-01-03 22:00:00"))

> t

[1] "2023-01-02 13:00:00 CST" "2023-01-03 22:00:00 CST"

> as.Date(t)

[1] "2023-01-02" "2023-01-04"

Consequence: as.Date() converts to UTC before taking the date. Because the dates are in timezone CST (Central North America), late times actually are converted to the following day. This is why the second date “2023-01-03” is converted to “2023-01-04” which is generally unexpected.

Solution: Use as_date() from the lubridate package

Type of error: programmatic

12 of 16

13 of 16

Are the results the code produces Reproducible?

14 of 16

Not documenting software versions…

Mistake: did not document R package versions during analysis. Many months later when submitting the manuscript - when rerunning the code one more time before data/code deposition - noticed posthoc test p-values had changed.

Consequence: newer R package version employed an updated method for multiple testing correction, leading to different p-values than the ones reported in the manuscript

Type of error: conceptual?

15 of 16

Unreliable behaviour of time/date conversion function

Mistake: as.POSIXct function in R. Depending on the input and the set timezone it converts the time back to GMT, but it showed unpredictable behaviour. It sometimes converted the time that was in CET (GMT+2h) to GMT+4h instead or the other way around (GMT to GMT+2h).

Consequence: results were off by +2 hours

→Double check the timestamp column after every conversion step

Type of error: programmatic

16 of 16