Using clusso with Case-Control Data

Introduction

Please report any issues here.

Messages from clusso have been suppressed for this tutorial, but there are many messages generated that we hope the user will find helpful. Feedback is welcome here.

Load clusso package:

library("clusso")

If you haven't already, you can install clusso directly from GitHub:

library("devtools")
devtools::install_github("mkamenet3/clusso")

Using clusso with case-control data

In this example, we will use simulated case-control data of across 208 prefects (municipalities) in Japan across 5 time periods.

Prepare Data

We load four data sets that come with the clusso package:

#load data
data("ccsjbc")
data("utmJapan")

These data sets contain:

  1. ccsjbc: data set containing a unique identifier for each prefect centroid (id), period of observation (categorical variable with 5 levels, period), number of cases per each geographic unit (numcases), the total number of cases and controls from each prefect (n). These data have been simulated.
  2. utmJapan: data set containing a unique identifier for each prefect centroid (id), x-coordinate in UTM (utmx), and y-coordinate in UTM (utmy).

To explore the 4 data sets, we apply the head() function to each of the data sets.

#inspect
head(utmJapan)
##     id     utmx    utmy
## 1 9201 399786.6 4047756
## 2 9202 360917.0 4023885
## 3 9203 385175.1 4025749
## 4 9204 371603.4 4018172
## 5 9205 388154.2 4047900
## 6 9206 375023.3 4068053
head(ccsjbc)
##     id period numcases     n   covar1     covar2 covar3 covar4 covar5
## 1 9201   7906     2944 10050 458.5877 0.01894360     34      1      0
## 2 9201   7922     2997 10032 470.2921 0.08078908     40      1      0
## 3 9201   7938     3044 10128 490.6435 0.02267590     32      0      1
## 4 9201   7954     3031 10227 501.6789 0.03259346     35      0      0
## 5 9201   7970     2976  9739 531.8578 0.01600942     32      0      1
## 6 9202   7906     3097 10565 521.2707 0.05811436     31      0      0

As a data-cleaning step, we first convert period to a factor in the data frame jbc:

ccsjbc$period <- as.factor(ccsjbc$period)
ccsjbc$id <- as.factor(ccsjbc$id)

Set global parameters

When using clusso, there are certain global parameters that need to be set.

We take the easting and northing coordinates from the utmJapan data set, set the easting coordinate, utmx, to xx and the northing coordinate, utmy, to yy. Each is divided by 1000 to change the scale from meters to kilometers.

xx <- utmJapan$utmx/1000
yy <- utmJapan$utmy/1000

Below, we set the maximum radius for a cluster to 20 km (rMax=20) based on scientific knowledge of the area and our definition of a meaningful cluster, set the number of unique time periods to 5 (Time=5).

rMax <- 20 
Time <- 5

Fitting using clusso

The argument model is the type of model to be used. In this case-control study example, we specify the binomial model.

The output of clusso() is assigned to the object resreal. resreal is a large list of lists. The recommended way to explore the results is to explore the names(resreal) and select each sub-list of interest.

head(ccsjbc)
##     id period numcases     n   covar1     covar2 covar3 covar4 covar5
## 1 9201   7906     2944 10050 458.5877 0.01894360     34      1      0
## 2 9201   7922     2997 10032 470.2921 0.08078908     40      1      0
## 3 9201   7938     3044 10128 490.6435 0.02267590     32      0      1
## 4 9201   7954     3031 10227 501.6789 0.03259346     35      0      0
## 5 9201   7970     2976  9739 531.8578 0.01600942     32      0      1
## 6 9202   7906     3097 10565 521.2707 0.05811436     31      0      0
system.time(resrealccs <- clusso(df=ccsjbc, expected = n, observed = numcases,
                                 timeperiod = period,id=id,
                                 covars=FALSE, x= xx,y = yy, 
                                 rMax =  rMax, utm=TRUE, 
                                 analysis="both", model="binomial",
                                 maxclust=15, collapsetime = FALSE))
## Running binomial both spatial and spatio-temporal model(s).
## Creating radius-based potential clusters
## No covariates found
## Number of potential clusters to scan through:  66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## No covariates found
## Number of potential clusters to scan through:  66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Overdispersion estimate: 2169.4765
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## No covariates found
## Number of potential clusters to scan through:  66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## No covariates found
## Number of potential clusters to scan through:  66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Overdispersion estimate: 2169.4765
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## All models ran successfully
##    user  system elapsed 
##   42.91    0.21   43.28
clussopretty(resrealccs, analysis="both", model="binomial",clusteridentify=FALSE)
##            model analysistype numclust.AIC numclust.AICc numclust.BIC
## 1       Binomial        Space           16            16           16
## 2       Binomial   Space-Time           16            16           16
## 3 Quasi-Binomial        Space           15            15           12
## 4 Quasi-Binomial   Space-Time           15            15           12

Case-Control Data with Covariates

We continue with the case-control data from Japan. We will again checkout the head() of the dataframe, ccsjbc.

head(ccsjbc)
##     id period numcases     n   covar1     covar2 covar3 covar4 covar5
## 1 9201   7906     2944 10050 458.5877 0.01894360     34      1      0
## 2 9201   7922     2997 10032 470.2921 0.08078908     40      1      0
## 3 9201   7938     3044 10128 490.6435 0.02267590     32      0      1
## 4 9201   7954     3031 10227 501.6789 0.03259346     35      0      0
## 5 9201   7970     2976  9739 531.8578 0.01600942     32      0      1
## 6 9202   7906     3097 10565 521.2707 0.05811436     31      0      0

There are 5 additional covariates, covar1 - covar5 that we will want to adjust for as unpenalized terms.

We now set the argument covars=TRUE to tell clusso() to identify the covariates and to include them in as unpenalized terms.

system.time(resccscovars<- clusso(ccsjbc, expected = n, observed=numcases,timeperiod = period, id=id,
                                  covars=TRUE, x= xx,y = yy, rMax =  rMax, utm=TRUE,
                                  analysis="both", model="binomial",maxclust=11, collapsetime=FALSE))
## Running binomial both spatial and spatio-temporal model(s).
## Creating radius-based potential clusters
## Running with covariates
## Number of potential clusters to scan through:  66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Running with covariates
## Number of potential clusters to scan through:  66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Overdispersion estimate: 2179.5856
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## Running with covariates
## Number of potential clusters to scan through:  66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Running with covariates
## Number of potential clusters to scan through:  66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Overdispersion estimate: 2179.5856
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## All models ran successfully
##    user  system elapsed 
##   48.97    0.44   53.26
clussopretty(resccscovars, analysis="both", model="binomial",clusteridentify = FALSE)
##            model analysistype numclust.AIC numclust.AICc numclust.BIC
## 1       Binomial        Space           12            12           12
## 2       Binomial   Space-Time           12            12           12
## 3 Quasi-Binomial        Space           11            11           11
## 4 Quasi-Binomial   Space-Time           11            11           11

We can also extract the coefficients for the unpenalized terms and undo the log-link by exponentiating:

out <- clussopretty(resccscovars, analysis="both", model="poisson",covars=TRUE)
out$table.coefs
##         IC         model analysistype time_period1 time_period2 time_period3
## 1   (Q)AIC       Poisson        Space            0            0            0
## 2   (Q)AIC       Poisson   Space-Time            0            0            0
## 3   (Q)AIC       Poisson        Space            0            0            0
## 4   (Q)AIC       Poisson   Space-Time            0            0            0
## 5  (Q)AICc Quasi-Poisson        Space            0            0            0
## 6  (Q)AICc Quasi-Poisson   Space-Time            0            0            0
## 7  (Q)AICc Quasi-Poisson        Space            0            0            0
## 8  (Q)AICc Quasi-Poisson   Space-Time            0            0            0
## 9   (Q)BIC       Poisson        Space            0            0            0
## 10  (Q)BIC       Poisson   Space-Time            0            0            0
## 11  (Q)BIC       Poisson        Space            0            0            0
## 12  (Q)BIC       Poisson   Space-Time            0            0            0
## 13  (Q)AIC Quasi-Poisson        Space            0            0            0
## 14  (Q)AIC Quasi-Poisson   Space-Time            0            0            0
## 15  (Q)AIC Quasi-Poisson        Space            0            0            0
## 16  (Q)AIC Quasi-Poisson   Space-Time            0            0            0
## 17 (Q)AICc       Poisson        Space            0            0            0
## 18 (Q)AICc       Poisson   Space-Time            0            0            0
## 19 (Q)AICc       Poisson        Space            0            0            0
## 20 (Q)AICc       Poisson   Space-Time            0            0            0
## 21  (Q)BIC Quasi-Poisson        Space            0            0            0
## 22  (Q)BIC Quasi-Poisson   Space-Time            0            0            0
## 23  (Q)BIC Quasi-Poisson        Space            0            0            0
## 24  (Q)BIC Quasi-Poisson   Space-Time            0            0            0
##    time_period4 time_period5       covar1       covar2       covar3      covar4
## 1             0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 2             0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 3             0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 4             0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 5             0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 6             0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 7             0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 8             0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 9             0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 10            0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 11            0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 12            0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 13            0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 14            0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 15            0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 16            0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 17            0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 18            0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 19            0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 20            0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 21            0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 22            0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 23            0            0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 24            0            0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
##         covar5
## 1  -0.01941128
## 2  -0.01937164
## 3  -0.01941128
## 4  -0.01937164
## 5  -0.01941128
## 6  -0.01937164
## 7  -0.01941128
## 8  -0.01937164
## 9  -0.01941128
## 10 -0.01937164
## 11 -0.01941128
## 12 -0.01937164
## 13 -0.01941128
## 14 -0.01937164
## 15 -0.01941128
## 16 -0.01937164
## 17 -0.01941128
## 18 -0.01937164
## 19 -0.01941128
## 20 -0.01937164
## 21 -0.01941128
## 22 -0.01937164
## 23 -0.01941128
## 24 -0.01937164
out$table.expcoefs
##         IC         model analysistype time_period1 time_period2 time_period3
## 1   (Q)AIC       Poisson        Space            1            1            1
## 2   (Q)AIC       Poisson   Space-Time            1            1            1
## 3   (Q)AIC       Poisson        Space            1            1            1
## 4   (Q)AIC       Poisson   Space-Time            1            1            1
## 5  (Q)AICc Quasi-Poisson        Space            1            1            1
## 6  (Q)AICc Quasi-Poisson   Space-Time            1            1            1
## 7  (Q)AICc Quasi-Poisson        Space            1            1            1
## 8  (Q)AICc Quasi-Poisson   Space-Time            1            1            1
## 9   (Q)BIC       Poisson        Space            1            1            1
## 10  (Q)BIC       Poisson   Space-Time            1            1            1
## 11  (Q)BIC       Poisson        Space            1            1            1
## 12  (Q)BIC       Poisson   Space-Time            1            1            1
## 13  (Q)AIC Quasi-Poisson        Space            1            1            1
## 14  (Q)AIC Quasi-Poisson   Space-Time            1            1            1
## 15  (Q)AIC Quasi-Poisson        Space            1            1            1
## 16  (Q)AIC Quasi-Poisson   Space-Time            1            1            1
## 17 (Q)AICc       Poisson        Space            1            1            1
## 18 (Q)AICc       Poisson   Space-Time            1            1            1
## 19 (Q)AICc       Poisson        Space            1            1            1
## 20 (Q)AICc       Poisson   Space-Time            1            1            1
## 21  (Q)BIC Quasi-Poisson        Space            1            1            1
## 22  (Q)BIC Quasi-Poisson   Space-Time            1            1            1
## 23  (Q)BIC Quasi-Poisson        Space            1            1            1
## 24  (Q)BIC Quasi-Poisson   Space-Time            1            1            1
##    time_period4 time_period5    covar1   covar2    covar3    covar4    covar5
## 1             1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 2             1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 3             1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 4             1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 5             1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 6             1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 7             1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 8             1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 9             1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 10            1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 11            1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 12            1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 13            1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 14            1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 15            1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 16            1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 17            1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 18            1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 19            1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 20            1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 21            1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 22            1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 23            1            1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 24            1            1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
out$table.clusters
##           model analysistype numclust.AIC numclust.AICc numclust.BIC
## 1       Poisson        Space           12            12           12
## 2       Poisson   Space-Time           12            12           12
## 3 Quasi-Poisson        Space           11            11           11
## 4 Quasi-Poisson   Space-Time           11            11           11