Ramosaj Burim, Tulowietzki Justus, Pauly Markus
Faculty of Statistics, TU Dortmund University, Joseph-Von-Fraunhofer Str. 2-4, 44227 Dortmund, Germany.
Entropy (Basel). 2022 Mar 9;24(3):386. doi: 10.3390/e24030386.
Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the use of modern Machine-Learning algorithms for imputation. This originates from their capability of showing favorable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine-Learning-based methods for both imputation and prediction are used. We see that even a slight decrease in imputation accuracy can seriously affect the prediction accuracy. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as the coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.
回归或分类问题中协变量缺失会妨碍直接使用先进工具进行进一步分析。最近的研究显示,使用现代机器学习算法进行插补的趋势日益增加。这源于它们在不同学习问题中展现出良好预测准确性的能力。在这项工作中,当使用基于机器学习的插补和预测方法时,我们通过模拟分析了回归学习问题中协变量缺失情况下插补准确性与预测准确性之间的相互作用。我们发现,即使插补准确性略有下降也会严重影响预测准确性。此外,我们探讨了在预测设置中使用统计推断程序时的插补性能,例如(有效)预测区间的覆盖率。我们的分析基于加州大学欧文分校机器学习库提供的经验数据集以及广泛的模拟研究。