• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于临床条件生成对抗网络的新型缺失数据插补方法在电子健康记录数据集的应用。

A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets.

机构信息

Department of Information Engineering (DII), Università Politecnica delle Marche, Ancona, Italy.

Grenoble Informatics Laboratory, Université Grenoble Alpes, Saint-Martin-d'Hères, France.

出版信息

Comput Biol Med. 2023 Sep;163:107188. doi: 10.1016/j.compbiomed.2023.107188. Epub 2023 Jun 22.

DOI:10.1016/j.compbiomed.2023.107188
PMID:37393785
Abstract

The missing data mechanism is a relevant problem in Machine Learning (ML) and biomedical informatics communities. Real-world Electronic Health Record (EHR) datasets comprise several missing values, thus revealing a high level of spatiotemporal sparsity in the predictors' matrix. Several approaches in the state-of-the-art tried to deal with this problem by proposing different data imputation strategies that (i) are often unrelated to the ML model, (ii) are not conceived for EHR data where laboratory exams are not prescribed uniformly over time and percentage of missing values is high (iii) exploit only univariate and linear information on the observed features. Our paper proposes a data imputation strategy based on a clinical conditional Generative Adversarial Network (ccGAN) capable of imputing missing values by exploiting non-linear and multivariate information across patients. Unlike other GAN data imputation-based approaches, our method deals explicitly with the high level of missingness of routine EHR data by conditioning the imputing strategy to the observable values and those fully-annotated. We demonstrated the statistical significance of the ccGAN to other state-of-the-art approaches in terms of imputation (around 19.79% of gain to the best competitor) and predictive performance (up to 1.60% of gain to the best competitor) on a real multi-diabetic centers dataset. We also demonstrated its robustness across different missingness rates (up to 1.61% of gain to the best competitor in the highest missingness rates condition) on an additional benchmark EHR dataset.

摘要

缺失数据机制是机器学习 (ML) 和生物医学信息学领域的一个相关问题。真实世界的电子健康记录 (EHR) 数据集包含多个缺失值,因此在预测器矩阵中呈现出高度的时空稀疏性。现有技术中的几种方法试图通过提出不同的数据插补策略来解决这个问题,这些策略 (i) 通常与 ML 模型无关,(ii) 不是为 EHR 数据设计的,因为实验室检查在时间上不是均匀规定的,缺失值的百分比很高,(iii) 仅利用观察特征的单变量和线性信息。我们的论文提出了一种基于临床条件生成对抗网络 (ccGAN) 的数据插补策略,该策略能够通过利用跨患者的非线性和多变量信息来插补缺失值。与其他基于 GAN 的数据插补方法不同,我们的方法通过将插补策略与可观察值和完全注释的值进行条件化,明确处理常规 EHR 数据的高缺失率问题。我们在一个真实的多糖尿病中心数据集上,根据插补 (与最佳竞争对手相比约有 19.79%的增益) 和预测性能 (与最佳竞争对手相比高达 1.60%的增益),证明了 ccGAN 相对于其他最新方法的统计学意义。我们还在另一个基准 EHR 数据集上,在不同的缺失率下 (在最高缺失率条件下与最佳竞争对手相比有 1.61%的增益),证明了它的稳健性。

相似文献

1
A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets.基于临床条件生成对抗网络的新型缺失数据插补方法在电子健康记录数据集的应用。
Comput Biol Med. 2023 Sep;163:107188. doi: 10.1016/j.compbiomed.2023.107188. Epub 2023 Jun 22.
2
Generative adversarial networks for imputing missing data for big data clinical research.生成对抗网络在大数据临床研究中用于填补缺失数据。
BMC Med Res Methodol. 2021 Apr 20;21(1):78. doi: 10.1186/s12874-021-01272-3.
3
A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks.基于生成对抗网络的电子健康记录中不完全和不平衡数据的联合学习方法。
Comput Biol Med. 2024 Jan;168:107687. doi: 10.1016/j.compbiomed.2023.107687. Epub 2023 Nov 14.
4
Extremely missing numerical data in Electronic Health Records for machine learning can be managed through simple imputation methods considering informative missingness: A comparative of solutions in a COVID-19 mortality case study.在电子健康记录中,针对机器学习的极度缺失数值数据可以通过考虑信息性缺失的简单插补方法来处理:一项关于COVID-19死亡率案例研究中各种解决方案的比较
Comput Methods Programs Biomed. 2023 Dec;242:107803. doi: 10.1016/j.cmpb.2023.107803. Epub 2023 Sep 7.
5
A hybrid of whale optimization and late acceptance hill climbing based imputation to enhance classification performance in electronic health records.基于鲸鱼优化算法和后期接受爬山算法的混合插补方法提高电子健康记录中的分类性能。
J Biomed Inform. 2019 Jun;94:103190. doi: 10.1016/j.jbi.2019.103190. Epub 2019 May 2.
6
PC-GAIN: Pseudo-label conditional generative adversarial imputation networks for incomplete data.PC-GAIN:用于不完整数据的伪标签条件生成对抗插补网络
Neural Netw. 2021 Sep;141:395-403. doi: 10.1016/j.neunet.2021.05.033. Epub 2021 Jun 2.
7
Imputing Biomarker Status from RWE Datasets-A Comparative Study.利用真实世界证据数据集推算生物标志物状态——一项比较研究
J Pers Med. 2021 Dec 13;11(12):1356. doi: 10.3390/jpm11121356.
8
Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems.基于生成对抗网络的多重插补法解决高维分块缺失值问题
Proc Int Conf Mach Learn Appl. 2021 Dec;2021:791-798. doi: 10.1109/icmla52953.2021.00131.
9
Increasing the Density of Laboratory Measures for Machine Learning Applications.提高机器学习应用中实验室测量的密度
J Clin Med. 2020 Dec 30;10(1):103. doi: 10.3390/jcm10010103.
10
Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data.基于现代机器学习方法在电子健康记录数据中的应用表现。
Epidemiology. 2023 Mar 1;34(2):206-215. doi: 10.1097/EDE.0000000000001578. Epub 2022 Dec 9.

引用本文的文献

1
Generative artificial intelligence in diabetes healthcare.糖尿病医疗保健中的生成式人工智能。
iScience. 2025 Jul 5;28(8):113051. doi: 10.1016/j.isci.2025.113051. eCollection 2025 Aug 15.
2
Role of Generative Artificial Intelligence in Personalized Medicine: A Systematic Review.生成式人工智能在个性化医疗中的作用:一项系统综述。
Cureus. 2025 Apr 15;17(4):e82310. doi: 10.7759/cureus.82310. eCollection 2025 Apr.
3
Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.
概念框架作为选择临床结构化数据集中缺失值的适当插补方法的指南。
BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3.
4
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.超越医学统计学:电子健康记录中缺失数据处理的系统评价
Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.
5
Transfer learning-enabled outcome prediction for guiding CRRT treatment of the pediatric patients with sepsis.基于迁移学习的儿童脓毒症患者 CRRT 治疗指导预后预测。
BMC Med Inform Decis Mak. 2024 Sep 27;24(1):266. doi: 10.1186/s12911-024-02623-y.
6
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.识别处理临床结构化数据集缺失值的最合适插补方法:系统评价。
BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.
7
Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network.结合随机森林与生成对抗插补网络的缺失数据插补方法
Sensors (Basel). 2024 Feb 8;24(4):1112. doi: 10.3390/s24041112.
8
A deep learning transformer model predicts high rates of undiagnosed rare disease in large electronic health systems.一种深度学习变压器模型预测大型电子健康系统中未诊断罕见病的高发生率。
medRxiv. 2023 Dec 24:2023.12.21.23300393. doi: 10.1101/2023.12.21.23300393.