Does anyone have an example of a document outlining "best practices" for how to prepare useful archived code (well annotated, etc) to go with data repositories?
— Daniel Bolnick (@DanielBolnick) November 21, 2021
This tweet by Daniel Bolnick received many useful replies with resources and guidelines for sharing code that’s useful and readable. Shortly after that, the editorial team at The American Naturalist published this guide guide for archiving data and code in ecology, evolution and behavior.
Coincidentally, I taught some relevant tools for this at a workshop last September and never wrote about it for this site. With a few packages and IDE tools, we can clean up our code efficiently, which goes a long way towards meeting the latest guidelines for clean, reproducible code.
Here’s a brief tour through my favorite tools:
packup
In interactive sessions and for less-structured workflows, I often add library()
calls to my scripts after realizing I need a function without scrolling up to the top to put all the package load calls together. packup
by Miles McBain provides an Rstudio addin to move all these calls up to the top of the script, remove any duplicates, and sort them alphabetically. Packup libary() calls
works for both .R and .Rmd files and I see no downside to calling it before sharing a script, just make sure that the reordering isn’t causing namespace conflicts.
With packup
we can easily go from this:
to this:
Bonus tip: In RStudio use Alt + arrow keys to move whole lines of code up or down (I did to nudge the commented line to the very top of the text).
annotater
The functions and RStudio addins in annotater
are only a small step towards reproducibility (for the real deal, I suggest renv
). After some time using them, I think they’ve proven their worth. Whenever I share code (including for the posts on this site, I use the Annotate package repository sources in active file
addin to automatically annotate my library load calls with the source of the packages being loaded (e.g. CRAN, GitHub, BioConductor, etc.) and the version number.
annotater
also has a cool function that can make a note of which functions are called from each of the packages being loaded in a file with ‘library’ calls. This can be useful to avoid dependency issues or making others install packages they don’t need (for running the code in that particular file).
Use the Annotate package repository sources in active file
function to turn this:
into this:
styler
styler
can help us format our code consistently using a style guide that generally leads to cleaner code that is easier to read (e.g., correct indentation, spaces after commas and around infix operators but not after opening parenthesis for function calls, etc.). We can style a selection or an entire file.
This mangled code has some spacing and style issues:
styler
functions can help with that, returning the code below. Note that I normally don’t use the built-in tidyverse style guide (derived from the Google Style guide for R code) for ggplot code (too many line breaks after opening parens).
Code sections
RStudio (and other IDEs) let us insert foldable sections, to split up scripts into discrete pieces that can be collapsed and navigated between using the little navigation panel that can be toggled on and off in the source pane. These can be built automatically for any comment with four trailing dashes, like so:
becomes
We can also insert these sections with Ctrl+Shift+R (Cmd+Shift+R on a Mac), and for longer scripts they bring more sanity.
littleboxes
Lastly, littleboxes
by the ThinkR squad gives us an addin for creating text titles with fancy ascii art boxes around them. I like to use these at the beginning of scripts, to include a date, the purpose of the file, and the author.
With the example from earlier, we can add titles, comments, dates or whatever, then select these lines and call Little Boxes
to add a fancy frame.
This:
Becomes this:
This animation shows how we can use these tools sequentially to clean up our code for sharing.
Try these tools out!