Using clusso with Maps

Introduction

Please report any issues here.

Messages from clusso have been suppressed for this tutorial, but there are many messages generated that we hope the user will find helpful. Feedback is welcome here.

This example uses simulated data which are meant for illustration of how to use clusso with maps in R.

Preparing the Map

Load clusso, as well as the tidyverse, tigris, sf, tmap, and sp packages:

library("clusso")
library(tidyverse)
library(tigris)
library(sf)
library(tmap)
library(sp)

If you haven't already, you can install clusso directly from GitHub:

library("devtools")
devtools::install_github("mkamenet3/clusso")

Using the counties() function from the tigris package, we load a county-level map of Wisconsin. We fortify() the map and create wi_map in order to use it with ggplot2:

wi <- tigris::counties("Wisconsin", cb=TRUE)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |==                                                                    |   4%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |=======                                                               |  11%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |=========                                                             |  14%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |================                                                      |  24%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |===================================                                   |  51%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |=====================================                                 |  54%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |========================================                              |  58%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=================================================                     |  71%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |===========================================================           |  85%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |===============================================================       |  91%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |=================================================================     |  94%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================|  99%
  |                                                                            
  |======================================================================| 100%
wi_map <- fortify(wi)

To plot the counties of Wisconsin, we use geom_map() from the ggplot2 package:

p1 <- ggplot() + geom_map(data=wi_map, map=wi_map, 
                    aes(x=long, y=lat, map_id=id),
                    color="black", fill="white", size=0.25) +
    coord_map() +
    ylab("Latitude") + 
    xlab("Longitude")
p1

Preparing the Data

Next, we simulate some data. For this example, we imagine that in each county there were a total number of people sampled (ntrials), some of which have outcome X (ncases). We assume the total number of people sampled come from a negative binomial distribution and use rnbinom() to simulate ntrials. We assume cases follow a binomial distribution and use rbinom() to draw cases from the population with probability \(\pi\). For illustration of clusso, we put a cluster with elevated rates in Wood, Portage, Waushera, Adams, Juneau, and Marquette counties. We will only be looking at space for this example so there is only a single time period.

#create ncases and ntrials for 72 counties in Wisconsin
set.seed(2)
ncounties <- length(unique(wi@data$NAME))
ntrials <- rnbinom(n=ncounties, mu=1000, size =10)
dat <- cbind.data.frame(id=unique(wi@data$NAME),ntrials)
dat <- dat %>%
    mutate(ncases = if_else(id %in% c("Wood", "Portage", "Waushara", "Adams", "Juneau","Marquette"), 
                            rbinom(ncounties,ntrials, 0.9), 
                            rbinom(ncounties,ntrials, 0.1)),
           time = "Period1")
str(dat)
## 'data.frame':    72 obs. of  4 variables:
##  $ id     : chr  "Ashland" "Grant" "Sawyer" "Crawford" ...
##  $ ntrials: num  668 636 737 743 1262 ...
##  $ ncases : int  71 66 60 67 106 67 131 89 54 680 ...
##  $ time   : chr  "Period1" "Period1" "Period1" "Period1" ...

The dataframe dat, that contains the county name (id), number of people sampled (ntrials), number of people with outcome X (ncases), and number of time periods (time).

Next, we need to merge dat with the SpatialPolygonsDataFrame, wi. We use merge() from the sp package to combine the two by county name (county name is the variable NAME in wi and is id in dat).

#merge to map
widat <-  sp::merge(wi,dat, by.x="NAME", by.y="id", duplicateGeoms=TRUE)
class(widat)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
head(widat)
##        NAME STATEFP COUNTYFP COUNTYNS       AFFGEOID GEOID LSAD      ALAND
## 2   Ashland      55      003 01581061 0500000US55003 55003   06 2706476264
## 22    Grant      55      043 01581081 0500000US55043 55043   06 2970403781
## 57   Sawyer      55      113 01581116 0500000US55113 55113   06 3257216619
## 12 Crawford      55      023 01581071 0500000US55023 55023   06 1477970887
## 19 Florence      55      037 01581078 0500000US55037 55037   06 1264232458
## 26     Iron      55      051 01581085 0500000US55051 55051   06 1963699141
##        AWATER ntrials ncases    time
## 2  3230710864     668     71 Period1
## 22   94520011     636     66 Period1
## 57  239883874     737     60 Period1
## 12   73976438     743     67 Period1
## 19   24111636    1262    106 Period1
## 26  416924548     773     67 Period1

In order to extract the centroid from each county, we first convert our new SpatialPolygonsDataFrame to a simple features (sf) object using st_as_sf() from the sf package. It is possible to extract centroids from the SpatialPolygonsDataFrame, but we do so here in order to also illustrate the sf package:

#convert to simple features sf
widatsf <- st_as_sf(widat)
class(widatsf)
## [1] "sf"         "data.frame"

To prepare widatsf for use with clusso(), we convert time and NAME to factors:

widatsf$time <- as.factor(widatsf$time)
widatsf$NAME <- as.factor(widatsf$NAME)

We extract the centroids using st_centroid() and extract the coordinates using st_coordinates(); both functions are from the sf package.

coords <- as.data.frame(st_coordinates(st_centroid(widatsf)))
head(coords)
##           X        Y
## 1 -90.67794 46.31608
## 2 -90.70620 42.86748
## 3 -91.14454 45.87998
## 4 -90.93104 43.23947
## 5 -88.39810 45.84846
## 6 -90.24206 46.26227

We can plot the centroids on top of the base map we created above, p1:

p1 + geom_point(data=coords,aes(x=X, y=Y), color="red", shape=3)

We can visualize the proportion of cases from the total number of trials by creating a new variable, prop:

widatsf <- widatsf %>%
    mutate(prop = ncases/ntrials)

ggplot(data=widatsf) +
    geom_histogram(aes(x=prop),binwidth = 0.01, 
                   fill="grey", color="black") +
    theme_minimal() +
    xlab("Proportion") +
    ggtitle("Histogram of Proportion Cases/Trials")

We visualize the proportion on the Wisconsin map. We take the natural log of the proportion to better see small differences at the lower end in the proportion:

ggplot(widatsf) + 
    geom_sf(aes(fill=log(prop))) +
    scale_fill_viridis_c() +
    xlab("Longitude") +
    ylab("Latitude") +
    ggtitle("(Log) Proportion of Cases/Trials in Wisconsin")

Using clusso

We first set global parameters, namely the maximum radius size for each potential cluster (rMax <- 100), the number of time periods to 1 (Time <- 1). We also extract our x and y coordinates, which are in longitude and latitude, and set them as xx and yy.

To perform the regularized spatial and spatio-temporal cluster detection using the Lasso penalty, we use the function clusso(). In the first argument, we set the name of the dataframe, df=widatsf. In the second argument we set expected counts equal to the total number of trials in the widatsf dataframe (expected = ntrials). In the third argument, we set the observed counts equal to the total number of cases in each county in the widatsf dataframe (observed = ncases). In the fourth argument, we set the timeperiod equal to the name of the time periods in the widatsf dataframe (timeperiod = time). In this example, we do not want to include other covariates as un-penalized terms in the model so we set covars = FALSE.

The next two arguments are x and y, which are the longitude and latitude for each centroids. Since these are longitude and latitude coordinates, we also set utm=FALSE. This will tell clusso to use the great-circle-distance or Haversine distance to calculate the distance between a centroid up to rMax.

The next argument is the maximum radius (in km) for the potential clusters. We set rMax = rMax, which we defined above as 100km. The argument utm specifies that our coordinates are NOT in UTM coordinates (utm = FALSE). The is analysis argument defines the type of analysis to be performed. Options include space, spacetime, or both. Here, we set analysis equal to space, which will perform only the spatial analysis. The argument model is the type of model to be used. In this example, we specify the binomial model. The argument maxclust is the maximum number of actual clusters expected in the given area. This is based on scientific knowledge of the area. For this dataset, we set maxclust=15. If the number of identified clusters exceeds maxclust, a warning will encourage you to increase your set maxclust.

The output of clusso() is assigned to the object resclusso. resclusso is a large list of lists. The recommended way to explore the results is to explore the names(resreal) and select each sub-list of interest.

rMax <- 100
Time <- 1

xx <- coords$X
yy <- coords$Y



resclusso <- clusso(df=widatsf,
                    expected = ntrials,
                    observed = ncases,
                    timeperiod = time,
                    id = NAME,
                    covars = FALSE,
                    x= xx,
                    y= yy,
                    rMax = rMax,
                    utm=FALSE,
                    analysis = "space",
                    model="binomial",
                    maxclust = 15, collapsetime = TRUE)
clussopretty(resclusso, analysis="space", model="binomial",clusteridentify=FALSE)
##            model analysistype numclust.AIC numclust.AICc numclust.BIC
## 1       Binomial        Space           17            17           17
## 2 Quasi-Binomial        Space           16            16           13

Plotting clusso Results on the Map

We set the max and min relative risk for the color scheme using maxrr and minrr. You may want to explore these values depending on the presentation you are using (a shortened maxrr, minrr distance will bring colors in closer and maps on a projector will appear more vivid and darker).

We extract the relative risk estimates using BIC from resclusso$lassoresult.p.s$E.qbic and then use redblue() to map the estimated relative risks to colors.

maxrr <- 10
minrr <- 0.1
estrr.bic <- resclusso$lassoresult.p.s$E.qbic
cluster_ix <- redblue(log(maxrr *  pmax(minrr, pmin(estrr.bic, maxrr)))/log(maxrr^2))
table(cluster_ix)
## cluster_ix
## #063365 #124A86 #1D5FA2 #2166AC #266CAF #3681BA #3783BB #3885BB #408FC1 #4493C3 
##       1       1       1       2       1       2       3       2       3       1 
## #4594C4 #4795C4 #4996C5 #4D99C6 #4E99C6 #4E9AC6 #539DC8 #5DA3CC #62A7CD #67AACF 
##       3       1       1       4       1       7       4       2       1       1 
## #6EAED1 #7CB7D6 #86BDDA #87BEDA #8BC0DB #91C4DD #97C7DF #9FCBE1 #A8D0E4 #BDDBEA 
##       1       2       1       3       5       4       1       3       1       1 
## #F2F4F6 #F6F6F6 #F8BDA0 
##       1       1       6

Next, we combine the estimated relative risks (resclusso$lassoresult.p.s$E.qbic), county names (widatsf$NAME), and mapped colors (cluster_ix) into the dataframe rr.bic. We map the county names onto the colors (names(cluster_ix) <- rr.bic$county).

rr.bic <- cbind.data.frame(rr=resclusso$lassoresult.p.s$E.qbic, 
                           county=widatsf$NAME, 
                           colors = cluster_ix)
names(cluster_ix) <- rr.bic$county

We merge the dataframe rr.bic to the SpatialPolygonsDataFrame, wi, by county name and convert it to a simple features dataframe for use with ggplot2.

merged.bic<- sp::merge(wi,rr.bic, by.x="NAME", by.y="county", duplicateGeoms=TRUE)
datdf.bic <- st_as_sf(merged.bic)

Finally, we use geom_sf() to plot the estimated relative risks onto the map. We use the aesthetic (aes(fill=NAME)) and scale_fill_manual(values=cluster_ix) to plot our pre-defined colors onto the map.

p1+ geom_sf(data=datdf.bic, aes(fill=NAME)) +
    scale_colour_manual(values = c("red", "blue", "green")) +
    scale_fill_manual(values=cluster_ix)+
    theme(plot.title = element_text(hjust=0.5)) +
    ggtitle("Binomial, Space: BIC") 

For a simpler map, we can also isolate the counties with relative risks greater than 1 and only outline county edges with red:

ggplot()+ geom_sf(data=datdf.bic, lwd=0, fill='transparent') +
    geom_sf(data=subset(datdf.bic, rr>1), color="red", size=2) +
    theme_bw()+
    theme(legend.position = "none",
          plot.title = element_text(hjust=0.5)) +
    ggtitle("Binomial, Space: BIC")