First, load the coefclust
package.
set.seed(1)
library(coefclust)
Load in the toy dataset for the spatio-temporal analysis that comes from the package, SE_FakeData_SpTM. Explore the dataset using the str()
function:
data("SE_FakeData_SpTm")
str(SE_FakeData_SpTm)
## 'data.frame': 616 obs. of 11 variables:
## $ State : Factor w/ 7 levels "Alabama","Florida",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ County: Factor w/ 438 levels "Abbeville","Adams",..: 17 21 24 36 40 51 54 58 69 77 ...
## $ FIPS : int 1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 ...
## $ long : num -86.6 -87.7 -85.4 -87.1 -86.6 ...
## $ lat : num 32.5 30.7 31.9 33 34 ...
## $ x1 : num -1.692 0.407 -3.864 7.074 3.073 ...
## $ x2 : num -2.141 -0.0214 -4.3699 5.7659 3.4897 ...
## $ x3 : num -3.213 0.759 -2.858 5.157 0.564 ...
## $ y1 : num -1.824 0.567 -0.144 16.095 -0.67 ...
## $ y2 : num -2.044 -0.212 -0.81 16.156 -1.007 ...
## $ y3 : num 2.489 1.072 -0.161 1.858 1.154 ...
This dataset has 11 variables: State, County, FIPS code (as an integer value), long (longitude of the centroid for each county), lat (latitude of the centroid for each county), 3 covariates (x1, x2, x3), and three responses (y1, y2, y2).
Next, we extract the longitude and latitude into separate objects, long
and lat
:
long <- SE_FakeData_SpTm$long; lat <- SE_FakeData_SpTm$lat
We must first prepare the data into lists. First we prepare the list yList
, where each element of the list will contain a single series of the three responses:
yList <-list()
yList[[1]] <- SE_FakeData_SpTm$y1
yList[[2]] <- SE_FakeData_SpTm$y2
yList[[3]] <- SE_FakeData_SpTm$y3
str(yList)
## List of 3
## $ : num [1:616] -1.824 0.567 -0.144 16.095 -0.67 ...
## $ : num [1:616] -2.044 -0.212 -0.81 16.156 -1.007 ...
## $ : num [1:616] 2.489 1.072 -0.161 1.858 1.154 ...
Next, we prepare the list XList
. Each element of XList
will have two columns. The first vector is 1 repeated the length of the number of observations. This first column corresponds to the intercept, \(\beta_0\). The second column corresponds to the x
covariate.
XList <-list()
XList[[1]] <- cbind(rep(1,length(long)), SE_FakeData_SpTm$x1)
XList[[2]] <- cbind(rep(1,length(long)), SE_FakeData_SpTm$x2)
XList[[3]] <- cbind(rep(1,length(long)), SE_FakeData_SpTm$x3)
str(XList)
## List of 3
## $ : num [1:616, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
## $ : num [1:616, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
## $ : num [1:616, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
We start by setting the maximum radius (MR
) to 300 and the number of simulation (M
) to 2. In a full simulation, we recommend setting M=1000
, but have only set it to 2 here for illustration:
MR <- 300
M <- 2
We use the function Find.Clusters.SI.ST()
for detectiong multiple clusters sequentially using simultaneous detection. This function will take the list of responses we created (yList
) as the first argument and the list of covariates (XList
) as the second argument. The lists of resposnes and covariates are followed by the longitude coordinate, latitude coordinate, the maximum radius for potential clusters MR
, the number of simulations M
, we set overlap=FALSE
to not allow for overlapping clusters, and set alpha=0.05
as the Type I error rate:
#Find Multiple Clusters Sequentially via the Simulatneous Detection
Clusters_simST <- Find.Clusters.SI.ST(yList, XList, long, lat, MR, M, overlap=FALSE, alpha=0.05)
str(Clusters_simST)
## List of 3
## $ Clusters : num [1, 1:5] 11 92.8697 0.3333 0.0352 16
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:5] "center" "radius" "pval" "elapsed" ...
## $ Coef :List of 2
## ..$ Coeff_History:List of 3
## .. ..$ : num [1:2, 1] 0.0395 0.062
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "beta_0" "beta_1"
## .. .. .. ..$ : chr "0"
## .. ..$ : num [1:2, 1] 0.1129 0.0619
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "beta_0" "beta_1"
## .. .. .. ..$ : chr "0"
## .. ..$ : num [1:2, 1] 0.1091 0.0186
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "beta_0" "beta_1"
## .. .. .. ..$ : chr "0"
## ..$ Coeff_Table : num [1:2, 1:3] 0.0395 0.062 0.1129 0.0619 0.1091 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "beta_0" "beta_1"
## .. .. ..$ : NULL
## $ Indicator:List of 1
## ..$ : logi [1:616] TRUE FALSE FALSE TRUE FALSE FALSE ...
Clusters_simST$Clusters
## center radius pval elapsed n_obs
## [1,] 11 92.86966 0.3333333 0.03516667 16
Clusters_simST$Coef
## $Coeff_History
## $Coeff_History[[1]]
## 0
## beta_0 0.03945850
## beta_1 0.06195994
##
## $Coeff_History[[2]]
## 0
## beta_0 0.11293727
## beta_1 0.06194135
##
## $Coeff_History[[3]]
## 0
## beta_0 0.10911750
## beta_1 0.01859798
##
##
## $Coeff_Table
## [,1] [,2] [,3]
## beta_0 0.03945850 0.11293727 0.10911750
## beta_1 0.06195994 0.06194135 0.01859798
table(Clusters_simST$Indicator)
##
## FALSE TRUE
## 600 16
The output is an object Clusters_simST
which is a list of 3 elements:
Clusters_simST$Clusters
(first element of list): provides the resulting identified cluster, with information about the center ID, radius, p-value for the cluster, elapsed computation time, and number of observations inside the detected cluster.Clusters_simST$Coef
(second element of list): provides estimates for beta_0
(intercept) and beta_1
(slope) for each time period.Clusters_simST$Indicator
(third element of list): is Boolean vector that is TRUE
if the given center is inside the cluster and FALSE
if it is not.To identify the cluster by it’s county name, we can column bind the information from the SE_FakeData_SpTm
dataset with the identified cluster information:
cbind(SE_FakeData_SpTm[Clusters_simST$Clusters[,1],1:2],Clusters_simST$Clusters)
## State County center radius pval elapsed n_obs
## 11 Alabama Chilton 11 92.86966 0.3333333 0.03516667 16
We use the functions Find.Clusters.TS.ST1()
, Find.Clusters.TS.ST2()
for detectiong multiple clusters sequentially using simultaneous detection. Like the Find.Clusters.SI.ST()
function for simultaneous detection above, this function will take the list of responses we created (yList
) as the first argument and the list of covariates (XList
) as the second argument. The lists of resposnes and covariates are followed by the longitude coordinate, latitude coordinate, the maximum radius for potential clusters MR
, the number of simulations M
, we set overlap=FALSE
to not allow for overlapping clusters, and set alpha=0.05
as the Type I error rate.
The results from the first stage (using Find.Clusters.TS.ST1()
) are stored as the object Clusters_ts1N
. This is then used as an argument for the function Find.Clusters.TS.ST2()
where we specify the clusters identified in the first stage as Cls1st=Clusters_ts1N
.
#Find Multiple Clusters Sequentially via the Two--Stage Detections
Clusters_ts1N <- Find.Clusters.TS.ST1(yList, XList, long, lat, MR, M, overlap=FALSE, alpha=0.05)
Clusters_ts2N <- Find.Clusters.TS.ST2(yList, XList, long, lat, MR, M, Cls1st=Clusters_ts1N, overlap=FALSE, alpha=0.05)
#Explore the spatio-temporal identified clusters (via two-stage detection)
str(Clusters_ts2N)
## List of 3
## $ Clusters : num [1, 1:6] 11 88.699 0.333 0.048 15 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:6] "center" "radius" "pval" "elapsed" ...
## $ Coef :List of 2
## ..$ Coeff_History:List of 3
## .. ..$ : num [1:2, 1] 0.0395 0.062
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "beta_0" "beta_1"
## .. .. .. ..$ : chr "0"
## .. ..$ : num [1:2, 1] 0.1129 0.0619
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "beta_0" "beta_1"
## .. .. .. ..$ : chr "0"
## .. ..$ : num [1:2, 1] 0.1091 0.0186
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "beta_0" "beta_1"
## .. .. .. ..$ : chr "0"
## ..$ Coeff_Table : num [1:2, 1:3] 0.0395 0.062 0.1129 0.0619 0.1091 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "beta_0" "beta_1"
## .. .. ..$ : NULL
## $ Indicator:List of 1
## ..$ : logi [1:616] TRUE FALSE FALSE TRUE FALSE FALSE ...
Clusters_ts2N$Clusters
## center radius pval elapsed n_obs stage
## [1,] 11 88.69924 0.3333333 0.048 15 1
Clusters_ts2N$Coef
## $Coeff_History
## $Coeff_History[[1]]
## 0
## beta_0 0.03945850
## beta_1 0.06195994
##
## $Coeff_History[[2]]
## 0
## beta_0 0.11293727
## beta_1 0.06194135
##
## $Coeff_History[[3]]
## 0
## beta_0 0.10911750
## beta_1 0.01859798
##
##
## $Coeff_Table
## [,1] [,2] [,3]
## beta_0 0.03945850 0.11293727 0.10911750
## beta_1 0.06195994 0.06194135 0.01859798
table(Clusters_ts2N$Indicator)
##
## FALSE TRUE
## 592 24
The final output of the spatio-temporal two-stage detection is stored in the object Clusters_ts2N
, which is a list of 3 elements:
Clusters_ts2N$Clusters
(first element of list): provides the resulting identified cluster, with information about the center ID, radius, p-value, elapsed computation time, and number of observations in the cluster.Clusters_ts2N$Coef
(second element of list): provides estimates for beta_0
(intercept) and beta_1
(slope) for each time period.Clusters_ts2N$Indicator
(third element of list): is Boolean vector that is TRUE
if the given center is inside the cluster and FALSE
if it is not.To identify the cluster by it’s county name, we can column bind the information from the SE_FakeData_SpTm
dataset with the identified cluster information:
cbind(SE_FakeData_SpTm[Clusters_ts2N$Clusters[,1],1:2],Clusters_ts2N$Clusters)
## State County center radius pval elapsed n_obs stage
## 11 Alabama Chilton 11 88.69924 0.3333333 0.048 15 1
In order to adjust for multiple testing across the two stages, the Bonferroni correction can be applied to the p-value associated with the detected cluster. To do so, we specify alpha=(0.05/2)
:
### With the Bonferroni correction
Clusters_ts1B <- Find.Clusters.TS.ST1(yList, XList, long, lat, MR, M, overlap=FALSE, alpha=(0.05/2))
Clusters_ts2B <- Find.Clusters.TS.ST2(yList, XList, long, lat, MR, M, Cls1st=Clusters_ts1B, overlap=FALSE, alpha=(0.05/2))
To identify the cluster by it’s county name, we can column bind the information from the SE_FakeData_SpTm
dataset with the identified cluster information:
cbind(SE_FakeData_SpTm[Clusters_ts2B$Clusters[,1],1:2],Clusters_ts2B$Clusters)
## State County center radius pval elapsed n_obs stage
## 11 Alabama Chilton 11 88.69924 0.3333333 0.05 15 1