Updated Jan 2020. Fixed an error in the code that overestimated richness in empty cells, and added a case study.
Counting how many underlying features intersect a layer of interest is a very common geoprocessing task, and has always been possible with GIS software. I’ve been getting into the sf package to do spatial analyses in R, and wanted to document and share this approach for quantifying and plotting species richness.
The code below is a fully reproducible (as long as the relevant packages are installed). I chose Costa Rica as an example. First, we will generate random points iteratively, and a case study with bat species is at the end of the post. Shapefiles can be read easily into sf using st_read. Here, I calculated gridded richness for point data first and then for convex hulls because those are some of the most common approaches used nowadays.
These are the main steps in the process:
Setup
subset a world map to get a single-country polygon
generate a random number of random points within the country for n different ‘species’
create smoothed convex hulls around each set of points
Geoprocessing
generate a grid to cover our country polygon
intersect and join the multipoint feature set and the convex hulls with the grid (separately). Note that we shouldn’t count the cells that don’t overlap the target feature.
Plotting
plot the richness grids using perceptually-uniform and colorblind-friendly palettes using ggplot, scico and sf
The different spatial elements look like this:
The blank map
Randomly generated points for n ‘species’
Smoothed convex hulls around the sets of points
Grid for calculating richness
Gridded richness for points
Gridded richness for convex hulls
Notes:
The rerun function from purrr is awesome, I hadn’t seen it before but it is a very cool replacement for most basic for loops. I don’t know how I missed it.
When plotting, we use a bounding box and and the st_touches function with some tidyverse magic (slicing and plucking) to get the adjacent countries so we can add more context to our plot and focus in on our polygon of interest without having to set up the limits manually.
We counted the number of intersecting features using a grid, but this can also be done using political boundaries (states, municipalities, counties), vegetation types or any other layer of interest.
Case study
We can apply this same approach to real data, using point occurrence data (long/lat) in the form of an sf multipoint object. We’ll use some point data for bats, downloaded using rgbif. We can group and summarize the three-column table (species, longitude, and latitude) and assing the appropriate projection. The rest of the workflow is unchanged, and it can also be applied to shapefiles.