Spreadsheet hijinks

Earlier in the week Jenny Bryan helped me ask the Twitter community what to call this widely used spreadsheet habit (see the image in my Tweet).

I kept track of the replies to my tweet and to Jenny’s retweet, and here are most of the suggested names…

and again as a proper table…

term given user hande
Replies to Luis NA
pain in the neck @AnneMarie_DC
interrupting subheaders @pandapoop42
Interstitial group labels @BrodieGaslam
Nested relational model @arnabdotorg
subgroups @Thoughtfulnz
group titles, group names @benomial
partial normalization for human/visual consumption @jgraham909
groups, grouping @jgraham909
demon rows @NthChapter
Meta-data @IsabellaGhement
Embheaders (embedded headers) @tammylarmstrong
pivots @antonycourtney
spreadsheet block groups, spreadsheet sub-table groups, sub-table groups @cormac85
Meta-data headers @cbirunda
group representatives, grouping criterion @Teggy
complete shit @StevenNHart
2D matrix in a column of a data frame @dnlakhotia
subgroups @enoches
paragraph grouping @gshotwell
Highlighted Collapsed Factor @PragmaticDon
small multiples @nacnudus
Replies to Jenny NA
Merged cells gone wild @RikaGorn
windowmakers, widowmakers @polesasunder
rowgory, separators @EmilyRiederer
Factros (factor rows) @EmilyRiederer
Growps = row + groups @thmscwlls
20 minutes of uninterrupted screaming @tpoi
premature tabulation @pdalgd
Read bumps @MilesMcBain
Row group headers @dmik3
factor interruptus @zentree
Beheaders @djhocking
Third Abnormal Form @pitakakariki
Hydra @JasonWilliamsNY
stubs @IanDennisMiller
nuisance categorical (or subgroup) variables @nspyrison
Categorical nuisance formatting @nspyrison
Business logic @doomsuckle
Data beheading! Factorless features, grouping gone wrong… @SamanthaSifleet
Adjacent attribution @dagoodman1
group names @benomial
facet but in tabular form @kdpsinghlab
murder of rows @RileenSinha
GroupNotRow @kevin_lanning

Overall, there seemed to be no clear-cut consensus but a few themes kept popping up, such as: groups, subgroups, headers, row groups, etc. Everyone is familiar with this somewhat annoying practice, and people from different disciplines pitched in with interpretations that often invoked concepts from database normalization or pivot tables.

Personally, I’m now partial to calling these things embedded subheaders. The header row typically contains the variable names, and the subheader concept seems more flexible. In this case they are embedded in the data rectangle to define subgroups or slices of data, equivalent to the small multiples concept from data visualization, as suggested by Duncan Garmonsway in his Spreadsheet Munging book.

I particularly liked adjacent attribution (suggested by Daniel Goodman) as a way to explain how embedded subheaders are expected to work. From what I could find out, this is a term from computer science used when defining clauses used to parse text strings. Embedded subheaders imply that the rows below them belong to a subgroup until a new subheader indicates otherwise, so establishing membership across different groups is a good example of attribution by adjaceny.

Lastly, I liked the name factros (factor rows) suggested by Emily Riederer, it has a cool tidyverse ring to it and I when I update the documentation for unheadr (an R package that can untangle most cases of embedded subheaders) with everyone’s feedback I will try to work it in.

If you have any other suggestions please let me know.