Luis Verde Arregoitia bio photo

Luis Verde Arregoitia

Personal research page - ecology . evolution . conservation . biogeography

Twitter Publons ResearchGate Google Scholar

With the NBA season about to restart, someone wrote in an online discussion that it feels like it’s always the same teams reaching the postseason every year. With few exceptions, I haven’t followed many seasons closely since the late 1990s, but I did share that feeling. This somehow led me to try and work out how often teams are reaching the postseason in consecutive years using readily available data. To address this, I transcribed playoff series data from 1992-2020 from the Basketball Reference website, and tried out some set intersects.

This post walks through applying intersects across consecutive pairs of list elements, using the 16 teams (8 per conference) that reach the NBA playoffs each season in relation to the year prior. The following code should be fully reproducible.

Tip-off

First we load the necessary libraries, read the data, pivot from wide to long, and clean up the loose ends.

# load packages
library(readr)     # CRAN v1.3.1
library(dplyr)     # [github::tidyverse/dplyr] v1.0.0.9000
library(stringr)   # CRAN v1.4.0
library(tidyr)     # CRAN v1.1.0
library(purrr)     # CRAN v0.3.4
library(ggplot2)   # CRAN v3.3.1
library(forcats)   # CRAN v0.5.0
library(ggalt)     # CRAN v0.4.0
library(artyfarty) # [github::datarootsio/artyfarty] v0.0.1
library(extrafont) # CRAN v0.17
# read from repository
playoffsrd <- read_csv("https://github.com/luisDVA/codeluis/raw/master/playoffsrdld.csv")
# melt, remove seeds, rename franchises
TeamsSeasons <- 
  playoffsrd %>% pivot_longer(WTeam:LTeam,values_to="Teams") %>% select(-name) %>% 
  mutate(Teams=str_remove(Teams,"\\s\\(.*")) %>% 
  mutate(Teams=case_when(str_detect(Teams,"^Seatt")~"Oklahoma City Thunder",
                         str_detect(Teams,"New Je")~"Brooklyn Nets",
                         str_detect(Teams,"Charlotte Bobcats")~"Charlotte Hornets",
                         str_detect(Teams,"Washington Bullets")~"Washington Wizards",
                         str_detect(Teams,"New Orleans Hornets")~"New Orleans Pelicans",
                         TRUE~Teams))

Resulting in a tidy, long format dataset with seasons, conference, and teams:

> TeamsSeasons
# A tibble: 448 x 3
      Yr conf    Teams                
   <dbl> <chr>   <chr>                
 1  2019 Eastern Milwaukee Bucks      
 2  2019 Eastern Detroit Pistons      
 3  2019 Eastern Toronto Raptors      
 4  2019 Eastern Orlando Magic        
 5  2019 Eastern Philadelphia 76ers   
 6  2019 Eastern Brooklyn Nets        
 7  2019 Eastern Boston Celtics       
 8  2019 Eastern Indiana Pacers       
 9  2019 Western Golden State Warriors
10  2019 Western Los Angeles Clippers 
# … with 438 more rows

Now, we can use dplyr::group_split to split the grouped data into a list of vectors.

# split into yearly lists, assign names
allTeamsSeasons <- TeamsSeasons %>% group_split(Yr) %>% purrr::map("Teams")
names(allTeamsSeasons) <- 
  TeamsSeasons %>% 
  group_by(Yr) %>% group_keys() %>% pull

A quick intersect (Reduce(intersect, allTeamsSeasons)) of all the list elements shows us that no single team has made the playoffs in all the years considered (some of the franchises haven’t even existed for that long).

Because the set of 16 teams that make the playoffs can vary each year, I used a pivoting approach to get all combinations of seasons and teams, including NA values when a team didn’t qualify.

# spread and gather to get all combinations
teamswide <-
  TeamsSeasons %>%
  mutate(qual = "yes") %>%
  pivot_wider(
    names_from = Yr,
    values_from = qual
  )
teamsQualLong <-
  teamswide %>% pivot_longer(`2019`:`1992`, names_to = "Season")

Let’s see.

> teamsQualLong
# A tibble: 868 x 4
   conf    Teams           Season value
   <chr>   <chr>           <chr>  <chr>
 1 Eastern Milwaukee Bucks 2019   yes  
 2 Eastern Milwaukee Bucks 2018   yes  
 3 Eastern Milwaukee Bucks 2017   yes  
 4 Eastern Milwaukee Bucks 2016   NA   
 5 Eastern Milwaukee Bucks 2015   yes  
 6 Eastern Milwaukee Bucks 2014   NA   
 7 Eastern Milwaukee Bucks 2013   yes  
 8 Eastern Milwaukee Bucks 2012   NA   
 9 Eastern Milwaukee Bucks 2011   NA   
10 Eastern Milwaukee Bucks 2010   yes  
# … with 858 more rows

With this long-format tibble we can already draw a faceted bubble chart to track how individual teams are doing across the whole study period, and this should correspond to what we get when intersecting consecutive list elements.

# bubble chart
ggplot(teamsQualLong) +
  geom_point(aes(x = Season, y = fct_rev(Teams), shape = value, fill = conf), color = "black", size = 4) +
  scale_shape_manual(values = c(21, NA), guide = F) +
  scale_fill_manual(values = c("#003ba6", "#dc0530"), guide = F) +
  facet_grid(rows = vars(conf), scales = "free") +
  labs(
    y = "",
    x = ""
  ) +
  artyfarty::theme_five38() +
  theme(
    text = element_text(size = 18, family = "Loma"),
    axis.text.x = element_text(size = 8),
    strip.background = element_blank(),
    strip.text = element_text(face = "bold", size = 15)
  )
click to enlarge

Looks good, even with minimal customization.

Now, we can split the grouped data for each conference into lists.

# list of teams by season and conference
# Eastern
TSList_E <- TeamsSeasons %>%
  filter(conf == "Eastern") %>%
  group_split(Yr) %>%
  purrr::map("Teams")
names(TSList_E) <-
  TeamsSeasons %>%
  filter(conf == "Eastern") %>%
  group_by(Yr) %>%
  group_keys() %>%
  pull()
# Western
TSList_W <- TeamsSeasons %>%
  filter(conf == "Western") %>%
  group_split(Yr) %>%
  purrr::map("Teams")
names(TSList_W) <-
  TeamsSeasons %>%
  filter(conf == "Western") %>%
  group_by(Yr) %>%
  group_keys() %>%
  pull()

… and then apply an intersect over consecutive pairs of list elements with a nifty mapply approach that relies on indices.

# which teams reach playoffs in consecutive seasons
intersect_consecutive <-
  function(veclist) {
    mapply(function(x, y) intersect(x, y), veclist[-length(veclist)], veclist[-1])
  }
yearlyPSeasonTeamsE <- intersect_consecutive(TSList_E)
yearlyPSeasonTeamsW <- intersect_consecutive(TSList_W)

We can now count which teams in a given season also played in the previous years’ postseason.

# how many teams from the previous season made the playoffs in the next one?
turnoverE <- yearlyPSeasonTeamsE %>%
  map_dbl(length) %>%
  tibble::enframe("year", "teams") %>%
  mutate_all(as.numeric) %>%
  mutate(conf = "Eastern Conference")
turnoverW <- yearlyPSeasonTeamsW %>%
  map_dbl(length) %>%
  tibble::enframe("year", "teams") %>%
  mutate_all(as.numeric) %>%
  mutate(conf = "Western Conference")
allSeasonsEWturnover <- bind_rows(turnoverE, turnoverW)

The resulting long-format tibble is also ready for plotting.

> allSeasonsEWturnover
# A tibble: 54 x 3
    year teams conf              
   <dbl> <dbl> <chr>             
 1     1     6 Eastern Conference
 2     2     6 Eastern Conference
 3     3     6 Eastern Conference
 4     4     6 Eastern Conference
 5     5     6 Eastern Conference
 6     6     5 Eastern Conference
 7     7     4 Eastern Conference
 8     8     6 Eastern Conference
 9     9     7 Eastern Conference
10    10     5 Eastern Conference
# … with 44 more rows

We and plot these data as line chart with an EKG feel to it using splines from ggalt.

ggplot(allSeasonsEWturnover, aes(x = year, y = teams, color = conf)) +
  geom_xspline(size = 2) +
  scale_color_manual(values = c("#003ba6", "#dc0530"), guide = F) +
  scale_x_continuous(
    breaks = unique(allSeasonsEWturnover$year),
    guide = guide_axis(n.dodge = 2)
  ) +
  facet_grid(~conf) +
  labs(
    y = "Number of teams that made\n the playoffs in previous season",
    x = "Season"
  ) +
  artyfarty::theme_five38() +
  theme(
    text = element_text(size = 18, family = "Loma"),
    axis.text.x = element_text(size = 7),
    strip.background = element_blank(),
    strip.text = element_text(face = "bold", size = 22)
  )
click to enlarge

This output checks out with the bubble chart.

At first glance:

  • Lots of playoff appearances by the Spurs.
  • More teams in the West reappearing in consecutive seasons.
  • Long gaps without playoffs for several franchises.
  • Same eight teams in the East played in 2011 and 2012.
  • Anything else?

I’ll try to update the post for 2020 once the seeds are defined.

All feedback welcome.