Animate your data wrangling
Yesterday I tweeted this gif showing what we can do about non-data grouping rows embedded in the data rectangle using the ‘unheadr’ package (we can and we should put them into their own variable in a tidier way). Please ignore the typo in the tweet.
manage to animate what we can do about non-data grouping rows embedded in the data rectangle using my silly little #rstats package 📦https://t.co/QP1X6ORtH8https://t.co/zJfVslUedN pic.twitter.com/KnAYdSAmc7
— Luis D. Verde (@LuisDVerde) August 12, 2018
There was some interest in the code behind the animation, and I wanted to share it anyway because it’s based on actual data and I think that’s pretty cool.
This is all made possible thanks to Thomas Lin Pedersen’s ‘gganimate’ package, a cool usecase with geom_tile() plots by @mikefc, and this post by David Robison where he melts a table into long format with indices for each row and column and a variable holding the value for each cell.
We can use real data from this table, originally from a book chapter about rodent sociobiology by Ojeda et al. (2016). I had a PDF version of the chapter, and I got the data into R following this post by Bob Rudis. I highly recommend ‘pdftools’ and ‘readr’ for importing PDF tables.
The book cover.
The first few lines of the table looked like this, and for this demo we can just set up the data directly as a tibble.
Setting up the data.
There are grouping values for the taxonomic families that the different genera belong to, and these are interspersed within the taxon variable. All taxonomic families end with “dae”, so we can match this with regex easily. Install ‘unheadr’ from GitHub before proceeding.
Once we have the original and ‘untangled’ version of the table, we define a function (inspired by @drob) to melt the data and apply it to each one.
Next we add two additional variables to the long-form tables, one for mapping fill colors and a label for facets (either in time or in space!).
After binding the two together, we can plot the tables as geom_tiles and use the ‘tstep’ variable to view them either side by side, or one after the other.
For now, ‘gganimate’ is only available on GitHub. Once we have installed it, ‘transition_states’ does all the magic.
Check it out!
Once the animation is rendered we can save it to disk using anim_save().
This approach seems like a good way to animate various types of common steps in data munging, and it should work nicely to illustrate how several ‘dplyr’ or ‘tidyr’ verbs work. I’ll make more animations in the near future.
Thanks for reading!