Antonelli Joseph, Zigler Corwin, Dominici Francesca
Department of Biostatistics, Harvard TH Chan School of Public Health, 655 Huntington Avenue, Boston, MA, 02115,USA.
Biostatistics. 2017 Jul 1;18(3):553-568. doi: 10.1093/biostatistics/kxx003.
In comparative effectiveness research, we are often interested in the estimation of an average causal effect from large observational data (the main study). Often this data does not measure all the necessary confounders. In many occasions, an extensive set of additional covariates is measured for a smaller and non-representative population (the validation study). In this setting, standard approaches for missing data imputation might not be adequate due to the large number of missing covariates in the main data relative to the smaller sample size of the validation data. We propose a Bayesian approach to estimate the average causal effect in the main study that borrows information from the validation study to improve confounding adjustment. Our approach combines ideas of Bayesian model averaging, confounder selection, and missing data imputation into a single framework. It allows for different treatment effects in the main study and in the validation study, and propagates the uncertainty due to the missing data imputation and confounder selection when estimating the average causal effect (ACE) in the main study. We compare our method to several existing approaches via simulation. We apply our method to a study examining the effect of surgical resection on survival among 10 396 Medicare beneficiaries with a brain tumor when additional covariate information is available on 2220 patients in SEER-Medicare. We find that the estimated ACE decreases by 30% when incorporating additional information from SEER-Medicare.
在比较效果研究中,我们常常关注从大型观察性数据(主要研究)中估计平均因果效应。通常,这些数据并未测量所有必要的混杂因素。在许多情况下,会针对一个规模较小且不具代表性的人群(验证研究)测量大量额外的协变量。在这种情况下,由于主要数据中缺失协变量的数量相对于验证数据较小的样本量而言较多,标准的缺失数据插补方法可能并不适用。我们提出一种贝叶斯方法来估计主要研究中的平均因果效应,该方法借鉴验证研究中的信息以改善混杂因素调整。我们的方法将贝叶斯模型平均、混杂因素选择和缺失数据插补的思想整合到一个单一框架中。它允许主要研究和验证研究中的治疗效果有所不同,并在估计主要研究中的平均因果效应(ACE)时传播因缺失数据插补和混杂因素选择而产生的不确定性。我们通过模拟将我们的方法与几种现有方法进行比较。我们将我们的方法应用于一项研究,该研究考察了手术切除对10396名患有脑肿瘤的医疗保险受益人生存率的影响,此时可获得SEER - 医疗保险中2220名患者的额外协变量信息。我们发现,纳入来自SEER - 医疗保险的额外信息后,估计的ACE降低了30%。