Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.
Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands.
Biom J. 2024 Jan;66(1):e2100237. doi: 10.1002/bimj.202100237. Epub 2022 May 12.
A common view in epidemiology is that automated confounder selection methods, such as backward elimination, should be avoided as they can lead to biased effect estimates and underestimation of their variance. Nevertheless, backward elimination remains regularly applied. We investigated if and under which conditions causal effect estimation in observational studies can improve by using backward elimination on a prespecified set of potential confounders. An expression was derived that quantifies how variable omission relates to bias and variance of effect estimators. Additionally, 3960 scenarios were defined and investigated by simulations comparing bias and mean squared error (MSE) of the conditional log odds ratio, log(cOR), and the marginal log risk ratio, log(mRR), between full models including all prespecified covariates and backward elimination of these covariates. Applying backward elimination resulted in a mean bias of 0.03 for log(cOR) and 0.02 for log(mRR), compared to 0.56 and 0.52 for log(cOR) and log(mRR), respectively, for a model without any covariate adjustment, and no bias for the full model. In less than 3% of the scenarios considered, the MSE of the log(cOR) or log(mRR) was slightly lower (max 3%) when backward elimination was used compared to the full model. When an initial set of potential confounders can be specified based on background knowledge, there is minimal added value of backward elimination. We advise not to use it and otherwise to provide ample arguments supporting its use.
在流行病学中,一种常见的观点认为,应避免使用自动混杂因素选择方法(如向后消除法),因为它们可能导致有偏差的效应估计值,并低估其方差。然而,向后消除法仍然经常被应用。我们研究了在观察性研究中,通过在预先指定的一组潜在混杂因素上使用向后消除法,是否以及在何种条件下可以改善因果效应估计。我们推导出了一个表达式,用于量化遗漏变量与效应估计量的偏差和方差之间的关系。此外,我们通过模拟定义和研究了 3960 种情况,比较了包括所有预先指定协变量的全模型和这些协变量的向后消除之间的条件对数优势比(log(cOR))和边缘对数风险比(log(mRR))的偏差和均方误差(MSE)。与没有任何协变量调整的模型相比,向后消除的 log(cOR)和 log(mRR)的平均偏差分别为 0.03 和 0.02,而没有任何协变量调整的模型的 log(cOR)和 log(mRR)的偏差分别为 0.56 和 0.52,对于全模型则没有偏差。在所考虑的情景中,不到 3%的情况下,与全模型相比,向后消除法的 log(cOR)或 log(mRR)的 MSE 略低(最大 3%)。当可以基于背景知识指定一组初始潜在混杂因素时,向后消除法几乎没有增加价值。我们建议不要使用它,否则请提供充分的论据来支持其使用。