Greenland Sander, Daniel Rhian, Pearce Neil
Department of Epidemiology and Department of Statistics, University of California, Los Angeles, CA, USA.
Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.
Int J Epidemiol. 2016 Apr;45(2):565-75. doi: 10.1093/ije/dyw040. Epub 2016 Apr 20.
Controlling for too many potential confounders can lead to or aggravate problems of data sparsity or multicollinearity, particularly when the number of covariates is large in relation to the study size. As a result, methods to reduce the number of modelled covariates are often deployed. We review several traditional modelling strategies, including stepwise regression and the 'change-in-estimate' (CIE) approach to deciding which potential confounders to include in an outcome-regression model for estimating effects of a targeted exposure. We discuss their shortcomings, and then provide some basic alternatives and refinements that do not require special macros or programming. Throughout, we assume the main goal is to derive the most accurate effect estimates obtainable from the data and commercial software. Allowing that most users must stay within standard software packages, this goal can be roughly approximated using basic methods to assess, and thereby minimize, mean squared error (MSE).
控制过多潜在混杂因素可能会导致或加剧数据稀疏或多重共线性问题,尤其是当协变量数量相对于研究规模较大时。因此,通常会采用减少建模协变量数量的方法。我们回顾了几种传统的建模策略,包括逐步回归和“估计值变化”(CIE)方法,以决定在估计目标暴露效应的结果回归模型中纳入哪些潜在混杂因素。我们讨论了它们的缺点,然后提供了一些基本的替代方法和改进方法,这些方法不需要特殊的宏或编程。在整个过程中,我们假设主要目标是从数据和商业软件中得出最准确的效应估计值。考虑到大多数用户必须使用标准软件包,这个目标可以通过使用基本方法来评估并从而最小化均方误差(MSE)大致实现。