Department of Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK; Medical Research Council Integrative Epidemiology Unit at the University of Bristol, University of Bristol, Bristol, UK.
Department of Medical Statistics, London School of Hygiene and Tropical Medicine, University of London, London, UK; Medical Research Council Clinical Trials Unit at University College London, University of London, London, UK.
J Clin Epidemiol. 2023 Aug;160:100-109. doi: 10.1016/j.jclinepi.2023.06.011. Epub 2023 Jun 19.
OBJECTIVES: Epidemiological studies often have missing data, which are commonly handled by multiple imputation (MI). Standard (default) MI procedures use simple linear covariate functions in the imputation model. We examine the bias that may be caused by acceptance of this default option and evaluate methods to identify problematic imputation models, providing practical guidance for researchers. STUDY DESIGN AND SETTING: Using simulation and real data analysis, we investigated how imputation model mis-specification affected MI performance, comparing results with complete records analysis (CRA). We considered scenarios in which imputation model mis-specification occurred because (i) the analysis model was mis-specified or (ii) the relationship between exposure and confounder was mis-specified. RESULTS: Mis-specification of the relationship between outcome and exposure, or between exposure and confounder, can cause biased CRA and MI estimates (in addition to any bias in the full-data estimate due to analysis model mis-specification). MI by predictive mean matching can mitigate model mis-specification. Methods for examining model mis-specification were effective in identifying mis-specified relationships. CONCLUSION: When using MI methods that assume data are MAR, compatibility between the analysis and imputation models is necessary, but not sufficient to avoid bias. We propose a step-by-step procedure for identifying and correcting mis-specification of imputation models.
目的: 流行病学研究经常存在缺失数据,通常采用多重插补(MI)来处理。标准(默认)MI 程序在插补模型中使用简单的线性协变量函数。我们研究了接受这种默认选项可能导致的偏差,并评估了识别有问题的插补模型的方法,为研究人员提供了实用的指导。
研究设计和设置: 使用模拟和真实数据分析,我们研究了插补模型误设定如何影响 MI 性能,将结果与完整记录分析(CRA)进行比较。我们考虑了以下两种情况:
结果: 暴露因素与结局之间,或暴露因素与混杂因素之间关系的误设定,可能导致 CRA 和 MI 估计值出现偏差(除了由于分析模型误设定而导致的全数据估计值中的任何偏差外)。预测均值匹配的 MI 可以减轻模型误设定的影响。用于检查模型误设定的方法可以有效地识别误设定的关系。
结论: 当使用假设数据为 MAR 的 MI 方法时,分析和插补模型之间的兼容性是必要的,但不足以避免偏差。我们提出了一种逐步识别和纠正插补模型误设定的程序。
BMC Med Res Methodol. 2017-9-6
J Clin Epidemiol. 2016-12
BMC Med Res Methodol. 2010-1-19
J Clin Epidemiol. 2013-12-2
Am J Epidemiol. 2024-8-27
Biom J. 2021-6
Int J Epidemiol. 2019-8-1
Stat Med. 2019-1-16
Stat Med. 2017-2-20
J Clin Epidemiol. 2016-12
BMC Med Res Methodol. 2014-6-5
Stat Med. 2010-11-30