Auchincloss Amy H, Diez Roux Ana V, Brown Daniel G, Raghunathan Trivellore E, Erdmann Christine A
Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan 48104, USA.
Epidemiology. 2007 Jul;18(4):469-78. doi: 10.1097/EDE.0b013e3180646320.
The measurement of area-level attributes remains a major challenge in studies of neighborhood health effects. Even when neighborhood survey data are collected, they necessarily have incomplete spatial coverage. We investigated whether interpolation of neighborhood survey data was aided by information on spatial dependencies and supplementary data. Neighborhood "availability of healthy foods" was measured in a population-based survey of 5186 persons in Baltimore, New York, and Forsyth County (North Carolina). The following supplementary data were compiled from Census 2000 and InfoUSA, Inc.: distance to supermarkets, density of supermarkets and fruit and vegetable stores, housing density, distance to a high-income area, and percent of households that do not own a vehicle. We compared 4 interpolation models (ordinary least squares, residual kriging, spatial error regression, and thin-plate splines) using error statistics and Pearson correlation coefficients (r) from repeated replications of cross-validations. There was positive spatial autocorrelation in neighborhood availability of healthy foods (by site, Moran coefficient range = 0.10-0.28; all P<0.0001). Prediction performances were generally similar for the evaluated models (r approximately 0.35 for Baltimore and Forsyth; r approximately 0.54 for New York). Supplementary data accounted for much of the spatial autocorrelation and, thus, spatial modeling was only advantageous when spatial correlation was at least moderate. A variety of interpolation techniques will likely need to be utilized in order to increase the data available for examining health effects of residential environments. The most appropriate method will vary depending on the construct of interest, availability of relevant supplementary data, and types of observed spatial patterns.
在邻里健康影响研究中,区域层面属性的测量仍是一项重大挑战。即便收集了邻里调查数据,其空间覆盖范围也必然不完整。我们研究了邻里调查数据的插值是否借助了空间依赖性信息和补充数据。在对巴尔的摩、纽约和福赛斯县(北卡罗来纳州)的5186人进行的一项基于人群的调查中,测量了邻里“健康食品的可及性”。从2000年人口普查和InfoUSA公司汇编了以下补充数据:到超市的距离、超市及果蔬店的密度、住房密度、到高收入地区的距离以及无车家庭的百分比。我们使用交叉验证重复复制的误差统计和皮尔逊相关系数(r)比较了4种插值模型(普通最小二乘法、残差克里金法、空间误差回归法和薄板样条法)。邻里健康食品可及性存在正空间自相关(按地点,莫兰系数范围 = 0.10 - 0.28;所有P < 0.0001)。评估模型的预测性能总体相似(巴尔的摩和福赛斯的r约为0.35;纽约的r约为0.54)。补充数据解释了大部分空间自相关,因此,只有当空间相关性至少为中等时,空间建模才具有优势。为了增加可用于研究居住环境健康影响的数据,可能需要采用多种插值技术。最合适的方法将因感兴趣的结构、相关补充数据的可及性以及观察到的空间模式类型而异。