• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

电子健康记录表格数据中基于注意力机制的缺失值插补

Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.

作者信息

Kowsar Ibna, Rabbani Shourav B, Samad Manar D

机构信息

Department of Computer Science, Tennessee State University, Nashville, TN, United States.

出版信息

Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:177-182. doi: 10.1109/ichi61247.2024.00030. Epub 2024 Aug 22.

DOI:10.1109/ichi61247.2024.00030
PMID:39387063
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11463999/
Abstract

The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.

摘要

电子健康记录表格数据中的缺失值插补(IMV)对于实现针对特定患者的预测建模的机器学习至关重要。虽然IMV方法是在生物统计学领域以及最近在机器学习领域中开发的,但基于深度学习的解决方案在学习表格数据方面取得的成功有限。本文提出了一种新颖的基于注意力的缺失值插补框架,该框架利用特征间(自注意力)或样本间注意力来学习重建带有缺失值的数据。我们采用对比学习中使用的数据处理方法来提高训练后的插补模型的泛化能力。所提出的自注意力插补方法优于基于统计和机器学习(决策树)的现有插补方法,在五个表格数据集上,将归一化均方根误差降低了18.4%至74.7%,在两个电子健康记录数据集上降低了52.6%至82.6%。当值完全随机缺失时,所提出的基于注意力的缺失值插补方法在广泛的缺失率范围(10%至50%)内表现出卓越的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff2e/11463999/a235b36f9376/nihms-2010820-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff2e/11463999/d16aa5f154e2/nihms-2010820-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff2e/11463999/a235b36f9376/nihms-2010820-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff2e/11463999/d16aa5f154e2/nihms-2010820-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff2e/11463999/a235b36f9376/nihms-2010820-f0002.jpg

相似文献

1
Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.电子健康记录表格数据中基于注意力机制的缺失值插补
Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:177-182. doi: 10.1109/ichi61247.2024.00030. Epub 2024 Aug 22.
2
Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework.在多重填补框架内使用聚类和深度学习进行缺失值估计
Knowl Based Syst. 2022 Aug 5;249. doi: 10.1016/j.knosys.2022.108968. Epub 2022 May 10.
3
Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study.在电子健康记录中,针对机器学习的极度缺失数值数据可以通过考虑信息性缺失的简单插补方法来处理:一项关于COVID-19死亡率案例研究中各种解决方案的比较
Comput Methods Programs Biomed. 2023 Dec;242:107803. doi: 10.1016/j.cmpb.2023.107803. Epub 2023 Sep 7.
4
Deep imputation of missing values in time series health data: A review with benchmarking.时间序列健康数据中缺失值的深度插补:综述与基准测试。
J Biomed Inform. 2023 Aug;144:104440. doi: 10.1016/j.jbi.2023.104440. Epub 2023 Jul 8.
5
Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data.基于现代机器学习方法在电子健康记录数据中的应用表现。
Epidemiology. 2023 Mar 1;34(2):206-215. doi: 10.1097/EDE.0000000000001578. Epub 2022 Dec 9.
6
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.识别处理临床结构化数据集缺失值的最合适插补方法:系统评价。
BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.
7
Neural networks based on attention architecture are robust to data missingness for early predicting hospital mortality in intensive care unit patients.基于注意力架构的神经网络对于重症监护病房患者早期预测医院死亡率的数据缺失具有鲁棒性。
Digit Health. 2023 May 7;9:20552076231171482. doi: 10.1177/20552076231171482. eCollection 2023 Jan-Dec.
8
A deep learning-based, unsupervised method to impute missing values in electronic health records for improved patient management.一种基于深度学习的、无监督的方法,用于填补电子健康记录中的缺失值,以改善患者管理。
J Biomed Inform. 2020 Nov;111:103576. doi: 10.1016/j.jbi.2020.103576. Epub 2020 Oct 1.
9
Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data.基于特征重要性的插补(IBFI):一种封装机器学习方法以插补时间序列数据中缺失模式的方法。
PLoS One. 2022 Jan 13;17(1):e0262131. doi: 10.1371/journal.pone.0262131. eCollection 2022.
10
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.

引用本文的文献

1
A Machine Learning Approach for Investigating Variable Importance in Relationship and Sexual Satisfaction: The Role of Interpersonal Mindfulness and Psychological Safety.一种用于研究人际关系和性满意度中变量重要性的机器学习方法:人际正念和心理安全感的作用。
J Marital Fam Ther. 2025 Apr;51(2):e70026. doi: 10.1111/jmft.70026.

本文引用的文献

1
Deep imputation of missing values in time series health data: A review with benchmarking.时间序列健康数据中缺失值的深度插补:综述与基准测试。
J Biomed Inform. 2023 Aug;144:104440. doi: 10.1016/j.jbi.2023.104440. Epub 2023 Jul 8.
2
The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms.健康健身房:用于开发强化学习算法的综合健康相关数据集。
Sci Data. 2022 Nov 11;9(1):693. doi: 10.1038/s41597-022-01784-7.
3
Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework.
在多重填补框架内使用聚类和深度学习进行缺失值估计
Knowl Based Syst. 2022 Aug 5;249. doi: 10.1016/j.knosys.2022.108968. Epub 2022 May 10.
4
Evaluating the impact of multivariate imputation by MICE in feature selection.评估 MICE 进行多元插补对特征选择的影响。
PLoS One. 2021 Jul 28;16(7):e0254720. doi: 10.1371/journal.pone.0254720. eCollection 2021.
5
Survey on Deep Neural Networks in Speech and Vision Systems.语音与视觉系统中的深度神经网络调查
Neurocomputing (Amst). 2020 Dec 5;417:302-321. doi: 10.1016/j.neucom.2020.07.053. Epub 2020 Jul 26.
6
Predicting Survival From Large Echocardiography and Electronic Health Record Datasets: Optimization With Machine Learning.从大型超声心动图和电子健康记录数据集预测生存:机器学习优化。
JACC Cardiovasc Imaging. 2019 Apr;12(4):681-689. doi: 10.1016/j.jcmg.2018.04.026. Epub 2018 Jun 13.
7
Multiple imputation by chained equations for systematically and sporadically missing multilevel data.多水平数据系统缺失和随机缺失的链方程多重插补法。
Stat Methods Med Res. 2018 Jun;27(6):1634-1649. doi: 10.1177/0962280216666564. Epub 2016 Sep 19.
8
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
9
MissForest--non-parametric missing value imputation for mixed-type data.MissForest--用于混合类型数据的非参数缺失值插补。
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.