Sonderman Jennifer S, Mumma Michael T, Cohen Sarah S, Cope Elizabeth L, Blot William J, Signorello Lisa B
International Epidemiology Institute, Rockville, MD 20850, USA.
Geospat Health. 2012 May;6(2):273-84. doi: 10.4081/gh.2012.145.
To enable spatial analyses within a large, prospective cohort study of nearly 86,000 adults enrolled in a 12-state area in the southeastern United States of America from 2002-2009, a multi-stage geocoding protocol was developed to efficiently maximize the proportion of participants assigned an address level geographic coordinate. Addresses were parsed, cleaned and standardized before applying a combination of automated and interactive geocoding tools. Our full protocol increased the non-Post Office (PO) Box match rate from 74.5% to 97.6%. Overall, we geocoded 99.96% of participant addresses, with only 5.2% at the ZIP code centroid level (2.8% PO Box and 2.3% non-PO Box addresses). One key to reducing the need for interactive geocoding was the use of multiple base maps. Still, addresses in areas with population density <44 persons/km2 were much more likely to require resource-intensive interactive geocoding than those in areas with >920 persons/km2 (odds ratio (OR) = 5.24; 95% confidence interval (CI) = 4.23, 6.49), as were addresses collected from participants during in-person interviews compared with mailed questionnaires (OR = 1.83; 95% CI = 1.59, 2.11). This study demonstrates that population density and address ascertainment method can influence automated geocoding results and that high success in address level geocoding is achievable for large-scale studies covering wide geographical areas.
为了在美国东南部12个州开展的一项近86,000名成年人的大型前瞻性队列研究中进行空间分析,我们制定了一个多阶段地理编码方案,以有效提高被分配地址级地理坐标的参与者比例。在应用自动和交互式地理编码工具组合之前,对地址进行了解析、清理和标准化。我们的完整方案将非邮政信箱匹配率从74.5%提高到了97.6%。总体而言,我们对99.96%的参与者地址进行了地理编码,只有5.2%处于邮政编码中心级别(2.8%为邮政信箱地址,2.3%为非邮政信箱地址)。减少交互式地理编码需求的一个关键是使用多个底图。然而,人口密度<44人/平方公里地区的地址比人口密度>920人/平方公里地区的地址更有可能需要资源密集型的交互式地理编码(优势比(OR)=5.24;95%置信区间(CI)=4.23, 6.49),与通过邮寄问卷收集的地址相比,在面对面访谈中从参与者那里收集的地址也是如此(OR = 1.83;95% CI = 1.59, 2.11)。这项研究表明,人口密度和地址确定方法会影响自动地理编码结果,并且对于覆盖广泛地理区域的大规模研究来说,在地址级地理编码方面取得高成功率是可以实现的。