Templ M, Ulmer Markus
Institute for Competitiveness and Communication, School of Business, University of Applied Sciences and Art Northwestern Switzerland, Olten, Switzerland.
Institute of Data Analysis and Process Design, School of Engineering, Zurich University of Applied Sciences, Winterthur, Switzerland.
J Appl Stat. 2024 Mar 5;51(14):2894-2928. doi: 10.1080/02664763.2024.2325969. eCollection 2024.
Many imputation methods have been developed over the years and tested mostly under ideal settings. Surprisingly, there is no detailed research on how imputation methods perform when the idealized assumptions about the distribution of data and/or model assumptions are partly not fulfilled. This research looks into the susceptibility of imputation techniques, particularly in relation to outliers, misclassifications, and incorrect model specifications. This is crucial knowledge about how well the methods convince in everyday life because, in reality, conditions are usually not ideal, and model assumptions may not hold. The data may not fit the defined models well. Outliers distort the estimates, and misclassifications reduce the quality of most imputation methods. Several different evaluation measures are discussed, from comparing imputed values with true values or comparing certain statistics, from the performance of classifiers to the variance of estimated parameters. Some well-known imputation methods are compared based on real data and simulations. It turns out that robust conditional imputation methods outperform other methods for real data and simulation settings.
多年来已经开发了许多插补方法,并且大多是在理想条件下进行测试的。令人惊讶的是,对于当关于数据分布和/或模型假设的理想化假设部分未得到满足时插补方法的表现如何,尚无详细研究。本研究探讨了插补技术的敏感性,特别是与异常值、错误分类和不正确的模型设定相关的敏感性。这是关于这些方法在实际应用中效果如何的关键知识,因为在现实中,条件通常并不理想,模型假设可能不成立。数据可能与定义的模型不太拟合。异常值会扭曲估计值,错误分类会降低大多数插补方法的质量。讨论了几种不同的评估方法,从将插补值与真实值进行比较或比较某些统计量,到分类器的性能再到估计参数的方差。基于实际数据和模拟对一些著名的插补方法进行了比较。结果表明,在实际数据和模拟设置中,稳健的条件插补方法优于其他方法。