使用插补法为非地理编码地址提供位置信息。

Using imputation to provide location information for nongeocoded addresses.

机构信息

Department of Environmental Health Sciences and Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America.

出版信息

PLoS One. 2010 Feb 10;5(2):e8998. doi: 10.1371/journal.pone.0008998.

DOI:10.1371/journal.pone.0008998

PMID:20161766

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2818716/

Abstract

BACKGROUND

The importance of geography as a source of variation in health research continues to receive sustained attention in the literature. The inclusion of geographic information in such research often begins by adding data to a map which is predicated by some knowledge of location. A precise level of spatial information is conventionally achieved through geocoding, the geographic information system (GIS) process of translating mailing address information to coordinates on a map. The geocoding process is not without its limitations, though, since there is always a percentage of addresses which cannot be converted successfully (nongeocodable). This raises concerns regarding bias since traditionally the practice has been to exclude nongeocoded data records from analysis.

METHODOLOGY/PRINCIPAL FINDINGS: In this manuscript we develop and evaluate a set of imputation strategies for dealing with missing spatial information from nongeocoded addresses. The strategies are developed assuming a known zip code with increasing use of collateral information, namely the spatial distribution of the population at risk. Strategies are evaluated using prostate cancer data obtained from the Maryland Cancer Registry. We consider total case enumerations at the Census county, tract, and block group level as the outcome of interest when applying and evaluating the methods. Multiple imputation is used to provide estimated total case counts based on complete data (geocodes plus imputed nongeocodes) with a measure of uncertainty. Results indicate that the imputation strategy based on using available population-based age, gender, and race information performed the best overall at the county, tract, and block group levels.

CONCLUSIONS/SIGNIFICANCE: The procedure allows for the potentially biased and likely under reported outcome, case enumerations based on only the geocoded records, to be presented with a statistically adjusted count (imputed count) with a measure of uncertainty that are based on all the case data, the geocodes and imputed nongeocodes. Similar strategies can be applied in other analysis settings.

摘要

背景

地理因素作为健康研究中变异来源的重要性在文献中持续受到关注。在这类研究中纳入地理信息通常始于在地图上添加数据，而这需要一些关于位置的知识。传统上，通过地理编码来实现精确的空间信息水平，这是地理信息系统（GIS）将邮寄地址信息转换为地图上坐标的过程。然而，地理编码过程并非没有其局限性，因为总有一定比例的地址无法成功转换（无法地理编码）。这引起了人们对偏差的关注，因为传统上的做法是将无法地理编码的数据记录排除在分析之外。

方法/主要发现：在本文中，我们开发并评估了一组用于处理无法地理编码地址中缺失空间信息的插补策略。这些策略是在假设已知邮政编码的情况下开发的，并逐渐增加使用抵押信息，即风险人群的空间分布。使用从马里兰州癌症登记处获得的前列腺癌数据来评估策略。当应用和评估方法时，我们将普查县、普查区和普查小区级别的总病例计数视为感兴趣的结果。采用多重插补法，根据完整数据（地理编码加插补的无法地理编码）提供有不确定性度量的估计总病例计数。结果表明，在县、普查区和普查小区各级，基于可用的基于人群的年龄、性别和种族信息的插补策略总体上表现最佳。

结论/意义：该程序允许呈现潜在有偏差且可能报告不足的结果，即仅基于地理编码记录的病例计数，并使用基于所有病例数据（地理编码和插补的无法地理编码）的统计调整计数（插补计数）和不确定性度量来表示。类似的策略可以应用于其他分析环境中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84f6/2818716/10c506a8db76/pone.0008998.g001.jpg

相似文献

Using imputation to provide location information for nongeocoded addresses.使用插补法为非地理编码地址提供位置信息。

PLoS One. 2010 Feb 10;5(2):e8998. doi: 10.1371/journal.pone.0008998.

Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes.评估邮政编码级数据的地理推断方法：在儿科糖尿病研究中的应用。

Int J Health Geogr. 2009 Oct 8;8:54. doi: 10.1186/1476-072X-8-54.

Estimating the accuracy of geographical imputation.估算地理归因的准确性。

Int J Health Geogr. 2008 Jan 23;7:3. doi: 10.1186/1476-072X-7-3.

Geocoding addresses from a large population-based study: lessons learned.来自一项大型基于人群研究的地理编码地址：经验教训

Epidemiology. 2003 Jul;14(4):399-407. doi: 10.1097/01.EDE.0000073160.79633.c1.

Improving geocoding practices: evaluation of geocoding tools.改进地理编码实践：地理编码工具评估

J Med Syst. 2004 Aug;28(4):361-70. doi: 10.1023/b:joms.0000032851.76239.e3.

Quantifying geocode location error using GIS methods.使用地理信息系统（GIS）方法量化地理编码位置误差。

Environ Health. 2007 Apr 4;6:10. doi: 10.1186/1476-069X-6-10.

Post office box addresses: a challenge for geographic information system-based studies.邮政信箱地址：基于地理信息系统研究的一项挑战。

Epidemiology. 2003 Jul;14(4):386-91. doi: 10.1097/01.EDE.0000073161.66729.89.

POINT: Pipeline for Offline Conversion and Integration of Geocodes and Neighborhood Data.要点：地理编码和社区数据的离线转换和集成管道。

Appl Clin Inform. 2023 Oct;14(5):833-842. doi: 10.1055/a-2148-6414. Epub 2023 Aug 4.

An effective and efficient approach for manually improving geocoded data.一种手动改进地理编码数据的有效且高效的方法。

Int J Health Geogr. 2008 Nov 26;7:60. doi: 10.1186/1476-072X-7-60.

Improved Geocoding of Cancer Registry Addresses in Urban and Rural Oklahoma.俄克拉荷马州城乡癌症登记地址地理编码的改进

J Registry Manag. 2020 Spring;47(1):13-20.

引用本文的文献

A multi-constraint Monte Carlo Simulation approach to downscaling cancer data.一种用于降尺度癌症数据的多约束蒙特卡罗模拟方法。

Health Place. 2025 Jan;91:103411. doi: 10.1016/j.healthplace.2024.103411. Epub 2025 Jan 6.

The quality of social determinants data in the electronic health record: a systematic review.电子健康记录中社会决定因素数据的质量：系统评价。

J Am Med Inform Assoc. 2021 Dec 28;29(1):187-196. doi: 10.1093/jamia/ocab199.

The association of neighborhood-level social class and tobacco consumption with adverse lung cancer characteristics in Maryland.马里兰州邻里层面的社会阶层与烟草消费与肺癌不良特征的关联。

Tob Induc Dis. 2019 Jan 25;17:06. doi: 10.18332/tid/100525. eCollection 2019.

Spatiotemporal Analysis of Oklahoma Tobacco Helpline Registrations Using Geoimputation and Joinpoint Analysis.利用空间插补和连接点分析对俄克拉荷马州烟草热线注册进行时空分析。

J Public Health Manag Pract. 2019 Sep/Oct;25 Suppl 5, Tribal Epidemiology Centers: Advancing Public Health in Indian Country for Over 20 Years(Suppl 5 TRIBAL EPIDEMIOLOGY CENTERS ADVANCING PUBLIC HEALTH IN INDIAN COUNTRY FOR OVER 20 YEARS):S61-S69. doi: 10.1097/PHH.0000000000000996.

Geographic Imputation of Missing Activity Space Data from Ecological Momentary Assessment (EMA) GPS Positions.基于生态瞬时评估 (EMA) GPS 位置缺失活动空间数据的地理推断。

Int J Environ Res Public Health. 2018 Dec 4;15(12):2740. doi: 10.3390/ijerph15122740.

Evaluation of geoimputation strategies in a large case study.在一项大型案例研究中评估地理推断策略。

Int J Health Geogr. 2018 Jul 31;17(1):30. doi: 10.1186/s12942-018-0151-y.

A geographic information system-based method for estimating cancer rates in non-census defined geographical areas.一种基于地理信息系统的方法，用于估计非人口普查定义地理区域的癌症发病率。

Cancer Causes Control. 2017 Oct;28(10):1095-1104. doi: 10.1007/s10552-017-0941-8. Epub 2017 Aug 20.

Neighborhood Factors and Fall-Related Injuries among Older Adults Seen by Emergency Medical Service Providers.紧急医疗服务提供者所接诊的老年人的邻里因素与跌倒相关伤害

Int J Environ Res Public Health. 2017 Feb 8;14(2):163. doi: 10.3390/ijerph14020163.

Using Small-Area Estimation to Calculate the Prevalence of Smoking by Subcounty Geographic Areas in King County, Washington, Behavioral Risk Factor Surveillance System, 2009-2013.利用小区域估计法计算华盛顿州金县按次县级地理区域划分的吸烟流行率，行为风险因素监测系统，2009 - 2013年。

Prev Chronic Dis. 2016 May 5;13:E59. doi: 10.5888/pcd13.150536.

The association of area-level social class and tobacco use with adverse breast cancer characteristics among white and black women: evidence from Maryland, 1992-2003.1992 - 2003年马里兰州白人和黑人女性中地区层面社会阶层及烟草使用与乳腺癌不良特征的关联：证据

Int J Health Geogr. 2015 Apr 1;14:13. doi: 10.1186/s12942-015-0007-7.