McLaughlin Colleen C, Boscoe Francis P
New York State Cancer Registry, New York State Department of Health, Corning Tower Room 536, Empire State Plaza, Albany, NY 12237, USA.
Health Place. 2007 Mar;13(1):152-63. doi: 10.1016/j.healthplace.2005.11.003. Epub 2006 Jan 6.
Monte Carlo methods are commonly used to assess the statistical significance of disease clusters. This usually involves permuting the observed outcome measure, such as the rate of disease, across the geographic units within the study area. When the variance of the disease rates is heterogeneous, however, randomizing the disease rate across the geographic units results in over-estimating the p-values in areas of low variance and under-estimating the p-values in areas of high variance. This bias results in under-ascertainment of clusters in urban areas and over-ascertainment of clusters in rural areas. As an alternative, randomizing the number of cases of disease or deaths proportional to the population at risk preserves the variance structure of the study area, therefore resulting in unbiased statistical inference. We compare results from randomizing rates with those from randomizing case counts, using county-level prostate cancer mortality data for the United States and ZIP-Code level prostate cancer incidence data for New York State, using the local Moran's I statistic.
蒙特卡罗方法通常用于评估疾病聚集的统计学显著性。这通常涉及在研究区域内的地理单元间对观察到的结局指标(如疾病发生率)进行置换。然而,当疾病发生率的方差不均匀时,在地理单元间随机化疾病发生率会导致在低方差区域高估p值,而在高方差区域低估p值。这种偏差导致城市地区聚集的漏判和农村地区聚集的误判。作为一种替代方法,按风险人群比例随机化疾病病例数或死亡数可保留研究区域的方差结构,从而得出无偏的统计推断。我们使用美国县级前列腺癌死亡率数据和纽约州邮政编码级前列腺癌发病率数据,通过局部莫兰指数统计量,比较了随机化发生率和随机化病例数的结果。