Shortreed Susan M, Ertefaie Ashkan
Biostatistics Unit, Group Health Research Institute, Seattle, Washington, U.S.A.
Department of Biostatistics, University of Washington, School of Public Health, Seattle, Washington, U.S.A.
Biometrics. 2017 Dec;73(4):1111-1122. doi: 10.1111/biom.12679. Epub 2017 Mar 8.
Methodological advancements, including propensity score methods, have resulted in improved unbiased estimation of treatment effects from observational data. Traditionally, a "throw in the kitchen sink" approach has been used to select covariates for inclusion into the propensity score, but recent work shows including unnecessary covariates can impact both the bias and statistical efficiency of propensity score estimators. In particular, the inclusion of covariates that impact exposure but not the outcome, can inflate standard errors without improving bias, while the inclusion of covariates associated with the outcome but unrelated to exposure can improve precision. We propose the outcome-adaptive lasso for selecting appropriate covariates for inclusion in propensity score models to account for confounding bias and maintaining statistical efficiency. This proposed approach can perform variable selection in the presence of a large number of spurious covariates, that is, covariates unrelated to outcome or exposure. We present theoretical and simulation results indicating that the outcome-adaptive lasso selects the propensity score model that includes all true confounders and predictors of outcome, while excluding other covariates. We illustrate covariate selection using the outcome-adaptive lasso, including comparison to alternative approaches, using simulated data and in a survey of patients using opioid therapy to manage chronic pain.
方法学的进步,包括倾向得分方法,已使从观察性数据中对治疗效果进行无偏估计得到改善。传统上,一直采用“把所有东西都扔进厨房水槽”的方法来选择纳入倾向得分的协变量,但最近的研究表明,纳入不必要的协变量会影响倾向得分估计量的偏差和统计效率。特别是,纳入影响暴露但不影响结局的协变量,会在不改善偏差的情况下使标准误膨胀,而纳入与结局相关但与暴露无关的协变量则可提高精度。我们提出了结局自适应套索法,用于选择纳入倾向得分模型的合适协变量,以解决混杂偏差并保持统计效率。这种提出的方法可以在存在大量虚假协变量(即与结局或暴露无关的协变量)的情况下进行变量选择。我们给出了理论和模拟结果,表明结局自适应套索法选择的倾向得分模型包含所有真正的混杂因素和结局预测因子,同时排除其他协变量。我们使用结局自适应套索法展示协变量选择,包括与其他方法的比较,使用模拟数据以及在一项使用阿片类药物疗法治疗慢性疼痛的患者调查中进行展示。