You ‘tidyr::complete()’ me

Here’s a quick example of how we can use dplyr and tidyr functions to complete sequences in a data frame given start and end values stored in separate columns. This was originally asked in a Spanish-Language R Facebook group and makes for a good use case of pivoting plus the complete() and full_seq() functions from tidyr which I had never written about nor used in my teaching materials.

For a tabular object like this one below, we want to expand the sequence (days) for each category, repeating the values of longitude as needed.

category	start_day	end_day	longitude
A	6	8	12
B	18	23	15
C	19	21	11
D	2	6	13

We want to end up with this:

category	day	longitude
A	6	12
A	7	12
A	8	12
B	18	15
B	19	15
B	20	15
B	21	15
B	22	15
B	23	15
C	19	11
C	20	11
C	21	11
D	2	13
D	3	13
D	4	13
D	5	13
D	6	13

Here’s how to complete the data frame with the missing combinations of data. Let’s set up our data first.

library(dplyr) # CRAN v1.1.4
library(tidyr) # CRAN v1.3.1

mydat <-
  tibble(category=c("A","B","C","D"),
         start_day=c(6,18,19,2),
         end_day=c(8,23,21,6),
         longitude=c(12,15,11,13))

Our initial data looks like this:

> mydat
# A tibble: 4 × 4
  category start_day end_day longitude
  <chr>        <dbl>   <dbl>     <dbl>
1 A                6       8        12
2 B               18      23        15
3 C               19      21        11
4 D                2       6        13

In this initial structure, the data is in wide format so we need to pivot the data so that start and end days for each group are together in the same variable.

Pivot longer like so:

mydat %>% pivot_longer(cols = c(start_day,end_day),
                       names_to = "day_type",values_to = "day")

We now have the key-value pairs and the data in long format.

# A tibble: 8 × 4
  category longitude day_type    day
  <chr>        <dbl> <chr>     <dbl>
1 A               12 start_day     6
2 A               12 end_day       8
3 B               15 start_day    18
4 B               15 end_day      23
5 C               11 start_day    19
6 C               11 end_day      21
7 D               13 start_day     2
8 D               13 end_day       6

Now we need to complete the sequence. For example: for category A we need separate rows for day 6, 7, and 8. For group B we need rows for days 18, 19, 20, 21, 22, and 23, and so on.

To create the full sequence of values in a vector, we can use tidyr::full_seq(). This function takes a numeric vector and a value describing the gaps or increments that need to be filled in.

full_seq(c(1,3,7),1) will return the same as 1:7.

complete() is a nice wrapper for another tidyr function: expand(). complete is pipe-friendly and operates on data frames (including grouped data frames).

On the grouped data frame, let’s complete the new day variable in increments of one, then to get the desired result lets fill() in the missing values in the longitude column and clean up.

mydat %>%
  pivot_longer(
    cols = c(start_day, end_day),
    names_to = "day_type", values_to = "day") %>%
  group_by(category) %>%
  complete(day = full_seq(day, 1)) %>%
  fill(longitude) %>%
  select(-day_type)

The result:

# A tibble: 17 × 3
# Groups:   category [4]
   category   day longitude
   <chr>    <dbl>     <dbl>
 1 A            6        12
 2 A            7        12
 3 A            8        12
 4 B           18        15
 5 B           19        15
 6 B           20        15
 7 B           21        15
 8 B           22        15
 9 B           23        15
10 C           19        11
11 C           20        11
12 C           21        11
13 D            2        13
14 D            3        13
15 D            4        13
16 D            5        13
17 D            6        13

Pretty cool! As always feel free to contact me with feedback or questions.

Share on

Twitter Facebook LinkedIn

You ‘tidyr::complete()’ me

Luis D. Verde Arregoitia

Share on

You may also enjoy

Cell and text formatting is everywhere

2024 LLMs/genAI + R roundup

Descargar, procesar, y acomodar imágenes en un mosaico hexagonal interactivo

Download, batch process, and tile images

category	day	longitude
A	6	12
A	7	12
A	8	12
B	18	15
B	19	15
B	20	15
B	21	15
B	22	15
B	23	15
C	19	11
C	20	11
C	21	11
D	2	13
D	3	13
D	4	13
D	5	13
D	6	13

category	day	longitude
A	6	12
A	7	12
A	8	12
B	18	15
B	19	15
B	20	15
B	21	15
B	22	15
B	23	15
C	19	11
C	20	11
C	21	11
D	2	13
D	3	13
D	4	13
D	5	13
D	6	13

category	day	longitude
A	6	12
A	7	12
A	8	12
B	18	15
B	19	15
B	20	15
B	21	15
B	22	15
B	23	15
C	19	11
C	20	11
C	21	11
D	2	13
D	3	13
D	4	13
D	5	13
D	6	13