Department of Mathematics and Statistics, American University, Washington, DC 20016, USA.
Int J Health Geogr. 2009 Oct 12;8:55. doi: 10.1186/1476-072X-8-55.
The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated.
We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*(pop); and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States.
For simulated data with outlier patterns, Tango's MEET, Moran's I and I*(pop) had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*(pop) (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data.
SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*(pop) perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan.
评估癌症发病率和死亡率的地理异质性的能力在癌症监测中很重要。已经开发出许多用于评估全局聚类和局部聚类模式的统计方法,并通过许多模拟研究进行了检验。然而,这些方法在两种极端情况下(全局聚类评估和局部异常(异常值)检测)的性能尚未得到彻底研究。
我们比较了全局聚类评估方法,包括 Tango 的指数、Moran 的 I 和 Oden 的 I*(pop);以及聚类检测方法,如局部 Moran 的 I 和 SaTScan 椭圆版本,这些方法模拟了美国大陆癌症病例的全局聚类模式和异常值。我们检查了所选方法在纯空间分析中的功效和精度。我们展示了 Tango 的 MEET 和 SaTScan 椭圆版本在 1987-2004 年美国 HIV 和 1950-1969 年肺癌死亡率数据上的应用。
对于具有异常值模式的模拟数据,Tango 的 MEET、Moran 的 I 和 I*(pop)的功效小于 0.2,而 SaTScan 的功效约为 0.97。对于具有全局聚类模式的模拟数据,Tango 的 MEET 和 I*(pop)(以总人口的 50%作为最大搜索窗口)的功效接近 1。SaTScan 的功效约为 0.7-0.8,而 Moran 的 I 的功效约为 0.2-0.3。在真实数据示例中,Tango 的 MEET 表明 HIV 和肺癌死亡率数据中存在全局聚类模式。SaTScan 发现 HIV 死亡率存在一个大的聚类,这与 Tango 的 MEET 的发现一致。SaTScan 还在肺癌死亡率数据中发现了聚类和异常值。
与本文评估的其他方法相比,SaTScan 椭圆版本在异常值检测方面更有效。在所选方法中,Tango 的 MEET 和 Oden 的 I*(pop)在全局聚类场景中表现最佳。由于 SaTScan 即使有足够的能力拒绝同质相对风险的零假设,也可能揭示出不正确的空间模式,因此对于具有全局聚类模式的数据,应谨慎使用 SaTScan。应使用 Tango 的方法进行全局聚类评估,而不是 SaTScan。