Oliver M Norman, Matthews Kevin A, Siadaty Mir, Hauck Fern R, Pickle Linda W
Department of Family Medicine, University of Virginia, Charlottesville, VA, USA.
Int J Health Geogr. 2005 Nov 10;4:29. doi: 10.1186/1476-072X-4-29.
This article describes geographic bias in GIS analyses with unrepresentative data owing to missing geocodes, using as an example a spatial analysis of prostate cancer incidence among whites and African Americans in Virginia, 1990-1999. Statistical tests for clustering were performed and such clusters mapped. The patterns of missing census tract identifiers for the cases were examined by generalized linear regression models.
The county of residency for all cases was known, and 26,338 (74%) of these cases were geocoded successfully to census tracts. Cluster maps showed patterns that appeared markedly different, depending upon whether one used all cases or those geocoded to the census tract. Multivariate regression analysis showed that, in the most rural counties (where the missing data were concentrated), the percent of a county's population over age 64 and with less than a high school education were both independently associated with a higher percent of missing geocodes.
We found statistically significant pattern differences resulting from spatially non-random differences in geocoding completeness across Virginia. Appropriate interpretation of maps, therefore, requires an understanding of this phenomenon, which we call "cartographic confounding."
本文通过以1990 - 1999年弗吉尼亚州白人和非裔美国人前列腺癌发病率的空间分析为例,描述了由于地理编码缺失导致的地理信息系统(GIS)分析中使用不具代表性数据时的地理偏差。进行了聚类的统计检验并绘制了此类聚类图。通过广义线性回归模型检查了病例中普查区标识符缺失的模式。
所有病例的居住县是已知的,其中26338例(74%)病例成功地被地理编码到普查区。聚类图显示的模式根据使用的是所有病例还是地理编码到普查区的病例而明显不同。多变量回归分析表明,在最偏远的县(缺失数据集中于此),64岁以上且受教育程度低于高中的县人口百分比均与更高的地理编码缺失百分比独立相关。
我们发现弗吉尼亚州地理编码完整性的空间非随机差异导致了具有统计学意义的模式差异。因此,对地图的恰当解读需要了解这一现象,我们将其称为“地图混淆”。