Zimmerman Dale L
Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA.
Biometrics. 2008 Mar;64(1):262-70. doi: 10.1111/j.1541-0420.2007.00870.x. Epub 2007 Aug 3.
The estimation of spatial intensity is an important inference problem in spatial epidemiologic studies. A standard data assimilation component of these studies is the assignment of a geocode, that is, point-level spatial coordinates, to the address of each subject in the study population. Unfortunately, when geocoding is performed by the standard automated method of street-segment matching to a georeferenced road file and subsequent interpolation, it is rarely completely successful. Typically, 10-30% of the addresses in the study population, and even higher percentages in particular subgroups, fail to geocode, potentially leading to a selection bias, called geographic bias, and an inefficient analysis. Missing-data methods could be considered for analyzing such data; however, because there is almost always some geographic information coarser than a point (e.g., a Zip code) observed for the addresses that fail to geocode, a coarsened-data analysis is more appropriate. This article develops methodology for estimating spatial intensity from coarsened geocoded data. Both nonparametric (kernel smoothing) and likelihood-based estimation procedures are considered. Substantial improvements in the estimation quality of coarsened-data analyses relative to analyses of only the observations that geocode are demonstrated via simulation and an example from a rural health study in Iowa.
空间强度估计是空间流行病学研究中的一个重要推断问题。这些研究的一个标准数据同化组成部分是为研究人群中每个受试者的地址分配地理编码,即点级空间坐标。不幸的是,当通过将街道段与地理参考道路文件进行匹配并随后进行插值的标准自动化方法进行地理编码时,很少能完全成功。通常,研究人群中10% - 30%的地址,在特定亚组中甚至有更高比例,无法进行地理编码,这可能导致一种选择偏差,称为地理偏差,以及低效的分析。可以考虑使用缺失数据方法来分析此类数据;然而,由于对于未能进行地理编码的地址,几乎总是存在一些比点更粗略的地理信息(例如邮政编码),因此进行粗化数据分析更为合适。本文开发了从粗化地理编码数据估计空间强度的方法。同时考虑了非参数(核平滑)和基于似然的估计程序。通过模拟和来自爱荷华州农村健康研究的一个示例表明,相对于仅对能进行地理编码的观测值进行分析,粗化数据分析的估计质量有了显著提高。