Please report any issues here.
Messages from clusso
have been suppressed for this tutorial, but there are many messages generated that we hope the user will find helpful. Feedback is welcome here.
Load clusso
package:
library("clusso")
If you haven't already, you can install clusso
directly from GitHub:
library("devtools")
devtools::install_github("mkamenet3/clusso")
clusso
with case-control dataIn this example, we will use simulated case-control data of across 208 prefects (municipalities) in Japan across 5 time periods.
We load four data sets that come with the clusso
package:
#load data
data("ccsjbc")
data("utmJapan")
These data sets contain:
ccsjbc
: data set containing a unique identifier for each prefect centroid (id), period of observation (categorical variable with 5 levels, period), number of cases per each geographic unit (numcases), the total number of cases and controls from each prefect (n). These data have been simulated.utmJapan
: data set containing a unique identifier for each prefect centroid (id), x-coordinate in UTM (utmx), and y-coordinate in UTM (utmy).To explore the 4 data sets, we apply the head()
function to each of the data sets.
#inspect
head(utmJapan)
## id utmx utmy
## 1 9201 399786.6 4047756
## 2 9202 360917.0 4023885
## 3 9203 385175.1 4025749
## 4 9204 371603.4 4018172
## 5 9205 388154.2 4047900
## 6 9206 375023.3 4068053
head(ccsjbc)
## id period numcases n covar1 covar2 covar3 covar4 covar5
## 1 9201 7906 2944 10050 458.5877 0.01894360 34 1 0
## 2 9201 7922 2997 10032 470.2921 0.08078908 40 1 0
## 3 9201 7938 3044 10128 490.6435 0.02267590 32 0 1
## 4 9201 7954 3031 10227 501.6789 0.03259346 35 0 0
## 5 9201 7970 2976 9739 531.8578 0.01600942 32 0 1
## 6 9202 7906 3097 10565 521.2707 0.05811436 31 0 0
As a data-cleaning step, we first convert period
to a factor in the data frame jbc
:
ccsjbc$period <- as.factor(ccsjbc$period)
ccsjbc$id <- as.factor(ccsjbc$id)
When using clusso
, there are certain global parameters that need to be set.
We take the easting and northing coordinates from the utmJapan
data set, set the easting coordinate, utmx
, to xx
and the northing coordinate, utmy
, to yy
. Each is divided by 1000 to change the scale from meters to kilometers.
xx <- utmJapan$utmx/1000
yy <- utmJapan$utmy/1000
Below, we set the maximum radius for a cluster to 20 km (rMax=20
) based on scientific knowledge of the area and our definition of a meaningful cluster, set the number of unique time periods to 5 (Time=5
).
rMax <- 20
Time <- 5
clusso
The argument model
is the type of model to be used. In this case-control study example, we specify the binomial model.
The output of clusso()
is assigned to the object resreal
. resreal
is a large list of lists. The recommended way to explore the results is to explore the names(resreal)
and select each sub-list of interest.
head(ccsjbc)
## id period numcases n covar1 covar2 covar3 covar4 covar5
## 1 9201 7906 2944 10050 458.5877 0.01894360 34 1 0
## 2 9201 7922 2997 10032 470.2921 0.08078908 40 1 0
## 3 9201 7938 3044 10128 490.6435 0.02267590 32 0 1
## 4 9201 7954 3031 10227 501.6789 0.03259346 35 0 0
## 5 9201 7970 2976 9739 531.8578 0.01600942 32 0 1
## 6 9202 7906 3097 10565 521.2707 0.05811436 31 0 0
system.time(resrealccs <- clusso(df=ccsjbc, expected = n, observed = numcases,
timeperiod = period,id=id,
covars=FALSE, x= xx,y = yy,
rMax = rMax, utm=TRUE,
analysis="both", model="binomial",
maxclust=15, collapsetime = FALSE))
## Running binomial both spatial and spatio-temporal model(s).
## Creating radius-based potential clusters
## No covariates found
## Number of potential clusters to scan through: 66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## No covariates found
## Number of potential clusters to scan through: 66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Overdispersion estimate: 2169.4765
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## No covariates found
## Number of potential clusters to scan through: 66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## No covariates found
## Number of potential clusters to scan through: 66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Overdispersion estimate: 2169.4765
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## All models ran successfully
## user system elapsed
## 42.91 0.21 43.28
clussopretty(resrealccs, analysis="both", model="binomial",clusteridentify=FALSE)
## model analysistype numclust.AIC numclust.AICc numclust.BIC
## 1 Binomial Space 16 16 16
## 2 Binomial Space-Time 16 16 16
## 3 Quasi-Binomial Space 15 15 12
## 4 Quasi-Binomial Space-Time 15 15 12
We continue with the case-control data from Japan. We will again checkout the head()
of the dataframe, ccsjbc
.
head(ccsjbc)
## id period numcases n covar1 covar2 covar3 covar4 covar5
## 1 9201 7906 2944 10050 458.5877 0.01894360 34 1 0
## 2 9201 7922 2997 10032 470.2921 0.08078908 40 1 0
## 3 9201 7938 3044 10128 490.6435 0.02267590 32 0 1
## 4 9201 7954 3031 10227 501.6789 0.03259346 35 0 0
## 5 9201 7970 2976 9739 531.8578 0.01600942 32 0 1
## 6 9202 7906 3097 10565 521.2707 0.05811436 31 0 0
There are 5 additional covariates, covar1
- covar5
that we will want to adjust for as unpenalized terms.
We now set the argument covars=TRUE
to tell clusso()
to identify the covariates and to include them in as unpenalized terms.
system.time(resccscovars<- clusso(ccsjbc, expected = n, observed=numcases,timeperiod = period, id=id,
covars=TRUE, x= xx,y = yy, rMax = rMax, utm=TRUE,
analysis="both", model="binomial",maxclust=11, collapsetime=FALSE))
## Running binomial both spatial and spatio-temporal model(s).
## Creating radius-based potential clusters
## Running with covariates
## Number of potential clusters to scan through: 66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Running with covariates
## Number of potential clusters to scan through: 66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Overdispersion estimate: 2179.5856
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## Running with covariates
## Number of potential clusters to scan through: 66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Running with covariates
## Number of potential clusters to scan through: 66870
## Path selection: information criteria
## Lasso complete - extracting estimates and paths
## Overdispersion estimate: 2179.5856
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## The number of clusters selected by at least one criterion is equal to maxclust. You may want to increase maxclust.
## All models ran successfully
## user system elapsed
## 48.97 0.44 53.26
clussopretty(resccscovars, analysis="both", model="binomial",clusteridentify = FALSE)
## model analysistype numclust.AIC numclust.AICc numclust.BIC
## 1 Binomial Space 12 12 12
## 2 Binomial Space-Time 12 12 12
## 3 Quasi-Binomial Space 11 11 11
## 4 Quasi-Binomial Space-Time 11 11 11
We can also extract the coefficients for the unpenalized terms and undo the log-link by exponentiating:
out <- clussopretty(resccscovars, analysis="both", model="poisson",covars=TRUE)
out$table.coefs
## IC model analysistype time_period1 time_period2 time_period3
## 1 (Q)AIC Poisson Space 0 0 0
## 2 (Q)AIC Poisson Space-Time 0 0 0
## 3 (Q)AIC Poisson Space 0 0 0
## 4 (Q)AIC Poisson Space-Time 0 0 0
## 5 (Q)AICc Quasi-Poisson Space 0 0 0
## 6 (Q)AICc Quasi-Poisson Space-Time 0 0 0
## 7 (Q)AICc Quasi-Poisson Space 0 0 0
## 8 (Q)AICc Quasi-Poisson Space-Time 0 0 0
## 9 (Q)BIC Poisson Space 0 0 0
## 10 (Q)BIC Poisson Space-Time 0 0 0
## 11 (Q)BIC Poisson Space 0 0 0
## 12 (Q)BIC Poisson Space-Time 0 0 0
## 13 (Q)AIC Quasi-Poisson Space 0 0 0
## 14 (Q)AIC Quasi-Poisson Space-Time 0 0 0
## 15 (Q)AIC Quasi-Poisson Space 0 0 0
## 16 (Q)AIC Quasi-Poisson Space-Time 0 0 0
## 17 (Q)AICc Poisson Space 0 0 0
## 18 (Q)AICc Poisson Space-Time 0 0 0
## 19 (Q)AICc Poisson Space 0 0 0
## 20 (Q)AICc Poisson Space-Time 0 0 0
## 21 (Q)BIC Quasi-Poisson Space 0 0 0
## 22 (Q)BIC Quasi-Poisson Space-Time 0 0 0
## 23 (Q)BIC Quasi-Poisson Space 0 0 0
## 24 (Q)BIC Quasi-Poisson Space-Time 0 0 0
## time_period4 time_period5 covar1 covar2 covar3 covar4
## 1 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 2 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 3 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 4 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 5 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 6 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 7 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 8 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 9 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 10 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 11 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 12 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 13 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 14 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 15 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 16 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 17 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 18 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 19 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 20 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 21 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 22 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## 23 0 0 -0.001121677 0.0001469638 -0.006516330 -0.01492126
## 24 0 0 -0.001121032 0.0001452889 -0.006522668 -0.01486178
## covar5
## 1 -0.01941128
## 2 -0.01937164
## 3 -0.01941128
## 4 -0.01937164
## 5 -0.01941128
## 6 -0.01937164
## 7 -0.01941128
## 8 -0.01937164
## 9 -0.01941128
## 10 -0.01937164
## 11 -0.01941128
## 12 -0.01937164
## 13 -0.01941128
## 14 -0.01937164
## 15 -0.01941128
## 16 -0.01937164
## 17 -0.01941128
## 18 -0.01937164
## 19 -0.01941128
## 20 -0.01937164
## 21 -0.01941128
## 22 -0.01937164
## 23 -0.01941128
## 24 -0.01937164
out$table.expcoefs
## IC model analysistype time_period1 time_period2 time_period3
## 1 (Q)AIC Poisson Space 1 1 1
## 2 (Q)AIC Poisson Space-Time 1 1 1
## 3 (Q)AIC Poisson Space 1 1 1
## 4 (Q)AIC Poisson Space-Time 1 1 1
## 5 (Q)AICc Quasi-Poisson Space 1 1 1
## 6 (Q)AICc Quasi-Poisson Space-Time 1 1 1
## 7 (Q)AICc Quasi-Poisson Space 1 1 1
## 8 (Q)AICc Quasi-Poisson Space-Time 1 1 1
## 9 (Q)BIC Poisson Space 1 1 1
## 10 (Q)BIC Poisson Space-Time 1 1 1
## 11 (Q)BIC Poisson Space 1 1 1
## 12 (Q)BIC Poisson Space-Time 1 1 1
## 13 (Q)AIC Quasi-Poisson Space 1 1 1
## 14 (Q)AIC Quasi-Poisson Space-Time 1 1 1
## 15 (Q)AIC Quasi-Poisson Space 1 1 1
## 16 (Q)AIC Quasi-Poisson Space-Time 1 1 1
## 17 (Q)AICc Poisson Space 1 1 1
## 18 (Q)AICc Poisson Space-Time 1 1 1
## 19 (Q)AICc Poisson Space 1 1 1
## 20 (Q)AICc Poisson Space-Time 1 1 1
## 21 (Q)BIC Quasi-Poisson Space 1 1 1
## 22 (Q)BIC Quasi-Poisson Space-Time 1 1 1
## 23 (Q)BIC Quasi-Poisson Space 1 1 1
## 24 (Q)BIC Quasi-Poisson Space-Time 1 1 1
## time_period4 time_period5 covar1 covar2 covar3 covar4 covar5
## 1 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 2 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 3 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 4 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 5 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 6 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 7 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 8 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 9 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 10 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 11 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 12 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 13 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 14 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 15 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 16 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 17 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 18 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 19 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 20 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 21 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 22 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
## 23 1 1 0.9988790 1.000147 0.9935049 0.9851895 0.9807759
## 24 1 1 0.9988796 1.000145 0.9934986 0.9852481 0.9808148
out$table.clusters
## model analysistype numclust.AIC numclust.AICc numclust.BIC
## 1 Poisson Space 12 12 12
## 2 Poisson Space-Time 12 12 12
## 3 Quasi-Poisson Space 11 11 11
## 4 Quasi-Poisson Space-Time 11 11 11