Here’s a quick workflow that uses a new function in unheadr to annotate meaningful formatting (cell coloring and font formatting) used to designate group membership.
📎 This example spreadsheet (packaged with unheadr) shows some tasks and behaviors that characterize a well-behaved dog, along with their respective scores. These rows belong in three different task types (shown in yellow cells and bold font), and the average score for each task type appears next to each one.
To work with this in R, we need to translate the colors and font faces into something flatter, and then we can pull out these ‘embedded subheaders’ and put them in their own indicator column if we want a tidy data structure.
For educational purposes, this example uses the RverbalExpressions package by Tyler Littlefield to build regular expressions, and I loaded tidylog by Benjamin Elbers to print some feedback about the dplyr operations happening. This is optional.
Once we load (install first if necessary) the required packages, the dog_test spreadsheet is bundled in unheadr and can be loaded with system.file(). We then tell annotate_mf() which variable has the meaningful formatting we want to annotate, and the name of the new annotated variable. This new function calls various useful tools from the tidyxl package (available on CRAN) by Duncan Garmonsway.
This is the resulting tibble:
To match the values that hold annotations, we use a regular expression built using RverbalExpressions as an argument for untangle2(). We also define the variable that has the subheaders and the name of the new variable that now explicitly indicates group membership.
Now let’s clean up the new variable using more regex to remove everything inside brackets (and the brackets too).
select: dropped one variable (Task_annotated)
mutate: changed 8 values (100%) of ‘task_type’ (0 new NA)
The data in a tidy structure:
Now we can summarize the scores for each task type, and they should match the calculated value present in the first place.
group_by: one grouping variable (task_type)
summarize: now 3 rows and 2 columns, ungrouped