Loh Wen Wei, Vansteelandt Stijn
Department of Data Analysis, Ghent University, Gent, Belgium.
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.
Stat Med. 2021 Feb 10;40(3):607-630. doi: 10.1002/sim.8792. Epub 2020 Nov 4.
Inferring the causal effect of a treatment on an outcome in an observational study requires adjusting for observed baseline confounders to avoid bias. However, adjusting for all observed baseline covariates, when only a subset are confounders of the effect of interest, is known to yield potentially inefficient and unstable estimators of the treatment effect. Furthermore, it raises the risk of finite-sample bias and bias due to model misspecification. For these stated reasons, confounder (or covariate) selection is commonly used to determine a subset of the available covariates that is sufficient for confounding adjustment. In this article, we propose a confounder selection strategy that focuses on stable estimation of the treatment effect. In particular, when the propensity score (PS) model already includes covariates that are sufficient to adjust for confounding, then the addition of covariates that are associated with either treatment or outcome alone, but not both, should not systematically change the effect estimator. The proposal, therefore, entails first prioritizing covariates for inclusion in the PS model, then using a change-in-estimate approach to select the smallest adjustment set that yields a stable effect estimate. The ability of the proposal to correctly select confounders, and to ensure valid inference of the treatment effect following data-driven covariate selection, is assessed empirically and compared with existing methods using simulation studies. We demonstrate the procedure using three different publicly available datasets commonly used for causal inference.
在观察性研究中推断治疗对结局的因果效应需要对观察到的基线混杂因素进行调整,以避免偏差。然而,当只有一部分观察到的基线协变量是感兴趣效应的混杂因素时,对所有这些协变量进行调整会产生潜在低效且不稳定的治疗效应估计量。此外,这会增加有限样本偏差以及因模型误设导致的偏差风险。基于上述原因,混杂因素(或协变量)选择通常用于确定可用协变量的一个子集,该子集足以进行混杂调整。在本文中,我们提出一种混杂因素选择策略,其重点在于对治疗效应进行稳定估计。具体而言,当倾向得分(PS)模型已经包含足以调整混杂的协变量时,那么仅与治疗或结局之一相关而非两者都相关的协变量的加入,不应系统性地改变效应估计量。因此,该提议首先需要对纳入PS模型的协变量进行优先级排序,然后使用估计量变化法来选择能产生稳定效应估计的最小调整集。通过实证评估该提议正确选择混杂因素以及在数据驱动的协变量选择后确保对治疗效应进行有效推断的能力,并使用模拟研究将其与现有方法进行比较。我们使用三个常用于因果推断的不同公开可用数据集演示了该过程。