Intersecting consecutive pairs of list elements in R

With the 2020 NBA playoffs set, someone wrote in an online discussion that it feels like it’s always the same teams reaching the postseason every year. With few exceptions, I haven’t followed many seasons closely since the late 1990s, but I did share that feeling. This somehow led me to try and work out how often teams are reaching the postseason in consecutive years using readily available data. To address this, I transcribed playoff series data from 1992-2020 from the Basketball Reference website, and tried out some set intersects.

This post walks through applying intersects across consecutive pairs of list elements, using the 16 teams (8 per conference) that reach the NBA playoffs each season in relation to the prior year. The following code should be fully reproducible.

Tip-off

First we load the necessary libraries, read the data, pivot from wide to long, and clean up the loose ends.

# load packages
library(readr)     # CRAN v1.3.1
library(dplyr)     # [github::tidyverse/dplyr] v1.0.0.9000
library(stringr)   # CRAN v1.4.0
library(tidyr)     # CRAN v1.1.0
library(purrr)     # CRAN v0.3.4
library(ggplot2)   # CRAN v3.3.1
library(forcats)   # CRAN v0.5.0
library(ggalt)     # CRAN v0.4.0
library(artyfarty) # [github::datarootsio/artyfarty] v0.0.1
library(extrafont) # CRAN v0.17
# read from repository
playoffsrd <- read_csv("https://github.com/luisDVA/codeluis/raw/master/playoffsrdld.csv")
# melt, remove seeds, rename franchises
TeamsSeasons <- 
  playoffsrd %>% pivot_longer(WTeam:LTeam,values_to="Teams") %>% select(-name) %>% 
  mutate(Teams=str_remove(Teams,"\\s\\(.*")) %>% 
  mutate(Teams=case_when(str_detect(Teams,"^Seatt")~"Oklahoma City Thunder",
                         str_detect(Teams,"New Je")~"Brooklyn Nets",
                         str_detect(Teams,"Charlotte Bobcats")~"Charlotte Hornets",
                         str_detect(Teams,"Washington Bullets")~"Washington Wizards",
                         str_detect(Teams,"New Orleans Hornets")~"New Orleans Pelicans",
                         TRUE~Teams))

Resulting in a tidy, long format dataset with seasons, conference, and teams:

> TeamsSeasons
# A tibble: 464 x 3
      Yr conf    Teams                 
   <dbl> <chr>   <chr>                 
 1  2020 Western Los Angeles Lakers    
 2  2020 Western Portland Trail Blazers
 3  2020 Western Los Angeles Clippers  
 4  2020 Western Dallas Mavericks      
 5  2020 Western Denver Nuggets        
 6  2020 Western Utah Jazz             
 7  2020 Western Houston Rockets       
 8  2020 Western Oklahoma City Thunder 
 9  2020 Eastern Milwaukee Bucks       
10  2020 Eastern Orlando Magic         
# … with 454 more rows

Now, we can use dplyr::group_split to split the grouped data into a list of vectors.

# split into yearly lists, assign names
allTeamsSeasons <- TeamsSeasons %>% group_split(Yr) %>% purrr::map("Teams")
names(allTeamsSeasons) <- 
  TeamsSeasons %>% 
  group_by(Yr) %>% group_keys() %>% pull

A quick intersect (Reduce(intersect, allTeamsSeasons)) of all the list elements shows us that no single team has made the playoffs in all the years considered (some of the franchises haven’t even existed for that long).

Because the set of 16 teams that make the playoffs can vary each year, I used a pivoting approach to get all combinations of seasons and teams, including NA values when a team didn’t qualify. If anyone knows of a better approach please let me know.

# spread and gather to get all combinations
teamswide <-
  TeamsSeasons %>%
  mutate(qual = "yes") %>%
  pivot_wider(
    names_from = Yr,
    values_from = qual
  )
teamsQualLong <-
  teamswide %>% pivot_longer(`2020`:`1992`, names_to = "Season")

Let’s see.

> teamsQualLong
# A tibble: 899 x 4
   conf    Teams              Season value
   <chr>   <chr>              <chr>  <chr>
 1 Western Los Angeles Lakers 2020   yes  
 2 Western Los Angeles Lakers 2019   NA   
 3 Western Los Angeles Lakers 2018   NA   
 4 Western Los Angeles Lakers 2017   NA   
 5 Western Los Angeles Lakers 2016   NA   
 6 Western Los Angeles Lakers 2015   NA   
 7 Western Los Angeles Lakers 2014   NA   
 8 Western Los Angeles Lakers 2013   yes  
 9 Western Los Angeles Lakers 2012   yes  
10 Western Los Angeles Lakers 2011   yes  
# … with 889 more rows

With this long-format tibble we can already draw a faceted bubble chart to track how individual teams are doing across the whole study period, and this should correspond to what we get when intersecting consecutive list elements.

# bubble chart
ggplot(teamsQualLong) +
  geom_point(aes(x = Season, y = fct_rev(Teams), shape = value, fill = conf), color = "black", size = 4) +
  scale_shape_manual(values = c(21, NA), guide = F) +
  scale_fill_manual(values = c("#003ba6", "#dc0530"), guide = F) +
  facet_grid(rows = vars(conf), scales = "free") +
  labs(
    y = "",
    x = ""
  ) +
  artyfarty::theme_five38() +
  theme(
    text = element_text(size = 18, family = "Loma"),
    axis.text.x = element_text(size = 8),
    strip.background = element_blank(),
    strip.text = element_text(face = "bold", size = 15)
  )

Looks good, even with minimal customization.

Now, we can split the grouped data for each conference into lists.

# list of teams by season and conference
# Eastern
TSList_E <- TeamsSeasons %>%
  filter(conf == "Eastern") %>%
  group_split(Yr) %>%
  purrr::map("Teams")
names(TSList_E) <-
  TeamsSeasons %>%
  filter(conf == "Eastern") %>%
  group_by(Yr) %>%
  group_keys() %>%
  pull()
# Western
TSList_W <- TeamsSeasons %>%
  filter(conf == "Western") %>%
  group_split(Yr) %>%
  purrr::map("Teams")
names(TSList_W) <-
  TeamsSeasons %>%
  filter(conf == "Western") %>%
  group_by(Yr) %>%
  group_keys() %>%
  pull()

… and then apply an intersect over consecutive pairs of list elements with a nifty mapply approach that relies on indices.

# which teams reach playoffs in consecutive seasons
intersect_consecutive <-
  function(veclist) {
    mapply(function(x, y) intersect(x, y), veclist[-length(veclist)], veclist[-1])
  }
yearlyPSeasonTeamsE <- intersect_consecutive(TSList_E)
yearlyPSeasonTeamsW <- intersect_consecutive(TSList_W)

We can now count which teams in a given season also played in the previous years’ postseason.

# how many teams from the previous season made the playoffs in the next one?
turnoverE <- yearlyPSeasonTeamsE %>%
  map_dbl(length) %>%
  tibble::enframe("year", "teams") %>%
  mutate_all(as.numeric) %>%
  mutate(conf = "Eastern Conference")
turnoverW <- yearlyPSeasonTeamsW %>%
  map_dbl(length) %>%
  tibble::enframe("year", "teams") %>%
  mutate_all(as.numeric) %>%
  mutate(conf = "Western Conference")
allSeasonsEWturnover <- bind_rows(turnoverE, turnoverW)

The resulting long-format tibble is also ready for plotting.

> allSeasonsEWturnover
# A tibble: 56 x 3
    year teams conf              
   <dbl> <dbl> <chr>             
 1  1992     6 Eastern Conference
 2  1993     6 Eastern Conference
 3  1994     6 Eastern Conference
 4  1995     6 Eastern Conference
 5  1996     6 Eastern Conference
 6  1997     5 Eastern Conference
 7  1998     4 Eastern Conference
 8  1999     6 Eastern Conference
 9  2000     7 Eastern Conference
10  2001     5 Eastern Conference
# … with 44 more rows

We and plot these data as line chart with an EKG feel to it using splines from ggalt.

ggplot(allSeasonsEWturnover, aes(x = year, y = teams, color = conf)) +
  geom_xspline(size = 2) +
  scale_color_manual(values = c("#003ba6", "#dc0530"), guide = F) +
  scale_x_continuous(
    breaks = unique(allSeasonsEWturnover$year),
    guide = guide_axis(n.dodge = 2)
  ) +
  facet_grid(~conf) +
  labs(
    y = "Number of teams that made\n the playoffs in previous season",
    x = "Season"
  ) +
  artyfarty::theme_five38() +
  theme(
    text = element_text(size = 18, family = "Loma"),
    axis.text.x = element_text(size = 7),
    strip.background = element_blank(),
    strip.text = element_text(face = "bold", size = 22)
  )

This output checks out with the bubble chart.

At first glance:

Lots of playoff appearances by the Spurs, sadly breaking their streak this season.
More teams in the West reappearing in consecutive seasons.
Long gaps without playoffs for several franchises.
Same eight teams in the East played in 2011 and 2012. Almost the same teams from last year playing this year in the East.
Anything else?

All feedback welcome.

Share on

Twitter Facebook LinkedIn

Intersecting consecutive pairs of list elements in R

Luis D. Verde Arregoitia

Tip-off

Share on

You may also enjoy

Cell and text formatting is everywhere

2024 LLMs/genAI + R roundup

Descargar, procesar, y acomodar imágenes en un mosaico hexagonal interactivo

Download, batch process, and tile images