Suppr超能文献

电子健康记录表格数据中基于注意力机制的缺失值插补

Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.

作者信息

Kowsar Ibna, Rabbani Shourav B, Samad Manar D

机构信息

Department of Computer Science, Tennessee State University, Nashville, TN, United States.

出版信息

Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:177-182. doi: 10.1109/ichi61247.2024.00030. Epub 2024 Aug 22.

Abstract

The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.

摘要

电子健康记录表格数据中的缺失值插补(IMV)对于实现针对特定患者的预测建模的机器学习至关重要。虽然IMV方法是在生物统计学领域以及最近在机器学习领域中开发的,但基于深度学习的解决方案在学习表格数据方面取得的成功有限。本文提出了一种新颖的基于注意力的缺失值插补框架,该框架利用特征间(自注意力)或样本间注意力来学习重建带有缺失值的数据。我们采用对比学习中使用的数据处理方法来提高训练后的插补模型的泛化能力。所提出的自注意力插补方法优于基于统计和机器学习(决策树)的现有插补方法,在五个表格数据集上,将归一化均方根误差降低了18.4%至74.7%,在两个电子健康记录数据集上降低了52.6%至82.6%。当值完全随机缺失时,所提出的基于注意力的缺失值插补方法在广泛的缺失率范围(10%至50%)内表现出卓越的性能。

相似文献

1
Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.电子健康记录表格数据中基于注意力机制的缺失值插补
Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:177-182. doi: 10.1109/ichi61247.2024.00030. Epub 2024 Aug 22.

本文引用的文献

4
Evaluating the impact of multivariate imputation by MICE in feature selection.评估 MICE 进行多元插补对特征选择的影响。
PLoS One. 2021 Jul 28;16(7):e0254720. doi: 10.1371/journal.pone.0254720. eCollection 2021.
5
Survey on Deep Neural Networks in Speech and Vision Systems.语音与视觉系统中的深度神经网络调查
Neurocomputing (Amst). 2020 Dec 5;417:302-321. doi: 10.1016/j.neucom.2020.07.053. Epub 2020 Jul 26.
9
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验