Texas Center for Geographic Information Science, Department of Geography, Texas State University-San Marcos, 601 University Drive, San Marcos, TX 78666, USA.
Int J Health Geogr. 2011 Mar 31;10:23. doi: 10.1186/1476-072X-10-23.
Kulldorff's spatial scan statistic has been one of the most widely used statistical methods for automatic detection of clusters in spatial data. One limitation of this method lies in the fact that it has to rely on scan windows with predefined shapes in the search process, and therefore it cannot detect cluster with arbitrary shapes. We employ a new neighbor-expanding approach and introduce two new algorithms to detect cluster with arbitrary shapes in spatial data. These two algorithms are called the maximum-likelihood-first (MLF) algorithm and non-greedy growth (NGG) algorithm. We then compare the performance of these two new algorithms with the spatial scan statistic (SaTScan), Tango's flexibly shaped spatial scan statistic (FlexScan), and Duczmal's simulated annealing (SA) method using two datasets. Furthermore, we utilize the methods to examine clusters of murine typhus cases in South Texas from 1996 to 2006.
When compared with the SaTScan and FlexScan method, the two new algorithms were more flexible and sensitive in detecting the clusters with arbitrary shapes in the test datasets. Clusters detected by the MLF algorithm are statistically more significant than those detected by the NGG algorithm. However, the NGG algorithm appears to be more stable when there are no extreme cluster patterns in the data. For the murine typhus data in South Texas, a large portion of the detected clusters were located in coastal counties where environmental conditions and socioeconomic status of some population groups were at a disadvantage when compared with those in other counties with no clusters of murine typhus cases.
The two new algorithms are effective in detecting the location and boundary of spatial clusters with arbitrary shapes. Additional research is needed to better understand the etiology of the concentration of murine typhus cases in some counties in south Texas.
库尔多夫的空间扫描统计方法是用于自动检测空间数据中聚类的最广泛使用的统计方法之一。该方法的一个局限性在于它在搜索过程中必须依赖于具有预定义形状的扫描窗口,因此无法检测任意形状的聚类。我们采用了一种新的邻域扩展方法,并引入了两种新的算法来检测空间数据中的任意形状聚类。这两种算法分别称为最大似然优先(MLF)算法和非贪婪增长(NGG)算法。然后,我们使用两个数据集将这两种新算法的性能与空间扫描统计(SaTScan)、Tango 的灵活形状空间扫描统计(FlexScan)和 Duczmal 的模拟退火(SA)方法进行了比较。此外,我们利用这些方法检查了 1996 年至 2006 年德克萨斯州南部的鼠型斑疹伤寒病例的集群。
与 SaTScan 和 FlexScan 方法相比,这两种新算法在检测测试数据集中的任意形状聚类时更加灵活和敏感。MLF 算法检测到的聚类在统计学上比 NGG 算法检测到的聚类更显著。然而,当数据中没有极端聚类模式时,NGG 算法似乎更稳定。对于德克萨斯州南部的鼠型斑疹伤寒数据,大部分检测到的聚类都位于沿海县,这些县的环境条件和一些人群的社会经济地位与没有鼠型斑疹伤寒病例的其他县相比处于不利地位。
这两种新算法可有效检测任意形状空间聚类的位置和边界。需要进一步研究以更好地了解德克萨斯州南部一些县鼠型斑疹伤寒病例集中的病因。