Whitsel Eric A, Quibrera P Miguel, Smith Richard L, Catellier Diane J, Liao Duanping, Henley Amanda C, Heiss Gerardo
Department of Epidemiology, University of North Carolina, Cardiovascular Disease Program, Bank of America Center Suite 306, 137 East Franklin Street, Chapel Hill, NC 27514, USA.
Epidemiol Perspect Innov. 2006 Jul 20;3:8. doi: 10.1186/1742-5573-3-8.
Published studies of geocoding accuracy often focus on a single geographic area, address source or vendor, do not adjust accuracy measures for address characteristics, and do not examine effects of inaccuracy on exposure measures. We addressed these issues in a Women's Health Initiative ancillary study, the Environmental Epidemiology of Arrhythmogenesis in WHI.
Addresses in 49 U.S. states (n = 3,615) with established coordinates were geocoded by four vendors (A-D). There were important differences among vendors in address match rate (98%; 82%; 81%; 30%), concordance between established and vendor-assigned census tracts (85%; 88%; 87%; 98%) and distance between established and vendor-assigned coordinates (mean rho [meters]: 1809; 748; 704; 228). Mean rho was lowest among street-matched, complete, zip-coded, unedited and urban addresses, and addresses with North American Datum of 1983 or World Geodetic System of 1984 coordinates. In mixed models restricted to vendors with minimally acceptable match rates (A-C) and adjusted for address characteristics, within-address correlation, and among-vendor heteroscedasticity of rho, differences in mean rho were small for street-type matches (280; 268; 275), i.e. likely to bias results relying on them about equally for most applications. In contrast, differences between centroid-type matches were substantial in some vendor contrasts, but not others (5497; 4303; 4210) p(interaction) < 10(-4), i.e. more likely to bias results differently in many applications. The adjusted odds of an address match was higher for vendor A versus C (odds ratio = 66, 95% confidence interval: 47, 93), but not B versus C (OR = 1.1, 95% CI: 0.9, 1.3). That of census tract concordance was no higher for vendor A versus C (OR = 1.0, 95% CI: 0.9, 1.2) or B versus C (OR = 1.1, 95% CI: 0.9, 1.3). Misclassification of a related exposure measure--distance to the nearest highway--increased with mean rho and in the absence of confounding, non-differential misclassification of this distance biased its hypothetical association with coronary heart disease mortality toward the null.
Geocoding error depends on measures used to evaluate it, address characteristics and vendor. Vendor selection presents a trade-off between potential for missing data and error in estimating spatially defined attributes. Informed selection is needed to control the trade-off and adjust analyses for its effects.
已发表的关于地理编码准确性的研究通常聚焦于单一地理区域、地址来源或供应商,未针对地址特征调整准确性测量方法,也未研究不准确对暴露测量的影响。我们在一项妇女健康倡议辅助研究(妇女健康倡议中致心律失常的环境流行病学研究)中解决了这些问题。
由四家供应商(A - D)对美国49个州(n = 3615)具有既定坐标的地址进行地理编码。供应商之间在地址匹配率(98%;82%;81%;30%)、既定普查区与供应商指定普查区之间的一致性(85%;88%;87%;98%)以及既定坐标与供应商指定坐标之间的距离(平均rho[米]:1809;748;704;228)方面存在重要差异。平均rho在街道匹配、完整、邮政编码、未编辑且为城市的地址以及具有1983年北美基准或1984年世界大地测量系统坐标的地址中最低。在限制于具有最低可接受匹配率的供应商(A - C)并针对地址特征、地址内相关性以及rho的供应商间异方差进行调整的混合模型中,对于街道类型匹配,平均rho的差异较小(280;268;275),即对于大多数应用而言,依赖它们可能对结果产生大致相同的偏差。相比之下,在某些供应商对比中,质心类型匹配之间的差异很大,但在其他对比中并非如此(5497;4303;4210),p(交互作用)< 10⁻⁴,即在许多应用中更有可能对结果产生不同的偏差。供应商A与C相比,地址匹配的调整后优势比更高(优势比 = 66,95%置信区间:47,93),但B与C相比则不然(OR = 1.1,95% CI:0.9,1.3)。供应商A与C相比,普查区一致性的优势比并不更高(OR = 1.0,95% CI:0.9,1.2)或B与C相比(OR = 1.1,95% CI:0.9,1.3)。与相关暴露测量——到最近高速公路的距离——的错误分类随着平均rho增加,并且在没有混杂因素的情况下,该距离的非差异性错误分类使其与冠心病死亡率的假设关联偏向于零。
地理编码错误取决于用于评估它的测量方法、地址特征和供应商。供应商选择在数据缺失可能性与估计空间定义属性时的误差之间存在权衡。需要明智的选择来控制这种权衡并针对其影响调整分析。