Kowsar Ibna, Rabbani Shourav B, Samad Manar D
Department of Computer Science, Tennessee State University, Nashville, TN, United States.
Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:177-182. doi: 10.1109/ichi61247.2024.00030. Epub 2024 Aug 22.
The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.
电子健康记录表格数据中的缺失值插补(IMV)对于实现针对特定患者的预测建模的机器学习至关重要。虽然IMV方法是在生物统计学领域以及最近在机器学习领域中开发的,但基于深度学习的解决方案在学习表格数据方面取得的成功有限。本文提出了一种新颖的基于注意力的缺失值插补框架,该框架利用特征间(自注意力)或样本间注意力来学习重建带有缺失值的数据。我们采用对比学习中使用的数据处理方法来提高训练后的插补模型的泛化能力。所提出的自注意力插补方法优于基于统计和机器学习(决策树)的现有插补方法,在五个表格数据集上,将归一化均方根误差降低了18.4%至74.7%,在两个电子健康记录数据集上降低了52.6%至82.6%。当值完全随机缺失时,所提出的基于注意力的缺失值插补方法在广泛的缺失率范围(10%至50%)内表现出卓越的性能。