Department of Epidemiology and Biostatistics, VU University Medical Center, P.O. box 7057, 1007 MB Amsterdam, The Netherlands; EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands; Department of Methodology and Applied Biostatistics, Faculty of Earth and Life Sciences, Institute for Health Sciences, VU University, De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands.
Department of Epidemiology and Biostatistics, VU University Medical Center, P.O. box 7057, 1007 MB Amsterdam, The Netherlands; EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands.
J Clin Epidemiol. 2014 Mar;67(3):335-42. doi: 10.1016/j.jclinepi.2013.09.009. Epub 2013 Dec 2.
OBJECTIVES: Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. STUDY DESIGN AND SETTING: Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. RESULTS: Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. CONCLUSION: We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.
目的:无论缺失值的比例如何,最常应用的都是完全案例分析,尽管也有多重插补(MI)等先进技术。本研究的目的是探讨在多项工具中某些、许多或所有项目得分缺失的情况下,处理缺失数据的简单和更先进方法的性能。
研究设计和设置:在作为线性回归模型协变量使用的多项变量中模拟实际缺失数据情况。模拟了各种缺失数据机制,并随着缺失数据的百分比增加。随后,应用了几种处理缺失数据的技术,以确定每种情况下最优化的技术。使用偏差和覆盖作为性能参数比较拟合回归系数。
结果:当超过 10%的受试者的数据缺失时,均值插补在每个缺失数据情况下都会导致有偏估计。此外,当大量受试者有缺失项目(>25%)时,应用于项目的 MI 方法优于应用于总分的方法。
结论:我们建议对项目得分应用 MI 以获得最准确的回归模型估计。此外,我们建议不要使用任何形式的均值插补来处理缺失数据。
J Clin Epidemiol. 2013-12-2
Stat Methods Med Res. 2016-6-22
BMC Med Res Methodol. 2010-1-19
Eur J Epidemiol. 2018-10-19
BMC Med Res Methodol. 2018-12-12
BMC Med Res Methodol. 2025-5-6
Health Qual Life Outcomes. 2025-4-5
Contemp Clin Trials Commun. 2024-10-18