Novartis Pharma AG, WSJ-103.1, Novartis Campus, Basel, Switzerland.
MRC-PHE Centre for Environment and Health, Imperial College London, St Mary's Campus, Norfolk Place, London, UK.
Biostatistics. 2019 Jan 1;20(1):1-16. doi: 10.1093/biostatistics/kxx058.
Small area ecological studies are commonly used in epidemiology to assess the impact of area level risk factors on health outcomes when data are only available in an aggregated form. However, the resulting estimates are often biased due to unmeasured confounders, which typically are not available from the standard administrative registries used for these studies. Extra information on confounders can be provided through external data sets such as surveys or cohorts, where the data are available at the individual level rather than at the area level; however, such data typically lack the geographical coverage of administrative registries. We develop a framework of analysis which combines ecological and individual level data from different sources to provide an adjusted estimate of area level risk factors which is less biased. Our method (i) summarizes all available individual level confounders into an area level scalar variable, which we call ecological propensity score (EPS), (ii) implements a hierarchical structured approach to impute the values of EPS whenever they are missing, and (iii) includes the estimated and imputed EPS into the ecological regression linking the risk factors to the health outcome. Through a simulation study, we show that integrating individual level data into small area analyses via EPS is a promising method to reduce the bias intrinsic in ecological studies due to unmeasured confounders; we also apply the method to a real case study to evaluate the effect of air pollution on coronary heart disease hospital admissions in Greater London.
小区域生态研究常用于流行病学,当数据仅以聚合形式可用时,评估区域水平风险因素对健康结果的影响。然而,由于未测量的混杂因素,由此产生的估计通常存在偏差,这些混杂因素通常无法从用于这些研究的标准行政登记处获得。混杂因素的额外信息可以通过外部数据集(如调查或队列)提供,这些数据集的个体水平而不是区域水平提供数据;然而,这种数据通常缺乏行政登记处的地理覆盖范围。我们开发了一种分析框架,将来自不同来源的生态和个体水平数据结合起来,提供对区域水平风险因素的调整估计,从而减少偏差。我们的方法 (i) 将所有可用的个体水平混杂因素总结为一个区域水平标量变量,我们称之为生态倾向评分 (EPS),(ii) 实施分层结构方法,在 EPS 缺失时对其进行估算,以及 (iii) 将估计和估算的 EPS 纳入将风险因素与健康结果联系起来的生态回归中。通过模拟研究,我们表明,通过 EPS 将个体水平数据整合到小区域分析中是一种有前途的方法,可以减少由于未测量的混杂因素导致的生态研究固有的偏差;我们还将该方法应用于真实案例研究,以评估大伦敦地区空气污染对冠心病住院的影响。