• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用变分自编码器的部分多重插补:应对医疗保健数据中的非随机缺失

Partial Multiple Imputation With Variational Autoencoders: Tackling Not at Randomness in Healthcare Data.

作者信息

Pereira Ricardo Cardoso, Abreu Pedro Henriques, Rodrigues Pedro Pereira

出版信息

IEEE J Biomed Health Inform. 2022 Aug;26(8):4218-4227. doi: 10.1109/JBHI.2022.3172656. Epub 2022 Aug 11.

DOI:10.1109/JBHI.2022.3172656
PMID:35511840
Abstract

Missing data can pose severe consequences in critical contexts, such as clinical research based on routinely collected healthcare data. This issue is usually handled with imputation strategies, but these tend to produce poor and biased results under the Missing Not At Random (MNAR) mechanism. A recent trend that has been showing promising results for MNAR is the use of generative models, particularly Variational Autoencoders. However, they have a limitation: the imputed values are the result of a single sample, which can be biased. To tackle it, an extension to the Variational Autoencoder that uses a partial multiple imputation procedure is introduced in this work. The proposed method was compared to 8 state-of-the-art imputation strategies, in an experimental setup with 34 datasets from the medical context, injected with the MNAR mechanism (10% to 80% rates). The results were evaluated through the Mean Absolute Error, with the new method being the overall best in 71% of the datasets, significantly outperforming the remaining ones, particularly for high missing rates. Finally, a case study of a classification task with heart failure data was also conducted, where this method induced improvements in 50% of the classifiers.

摘要

在关键环境中,缺失数据可能会带来严重后果,比如基于常规收集的医疗保健数据进行的临床研究。这个问题通常通过插补策略来处理,但在非随机缺失(MNAR)机制下,这些策略往往会产生不佳且有偏差的结果。最近,一种对MNAR显示出有前景结果的趋势是使用生成模型,特别是变分自编码器。然而,它们有一个局限性:插补值是单个样本的结果,可能存在偏差。为了解决这个问题,本文引入了一种对变分自编码器的扩展,该扩展使用了部分多重插补程序。在一个实验设置中,将所提出的方法与8种先进的插补策略进行了比较,该实验设置使用了34个来自医学背景的数据集,并注入了MNAR机制(缺失率为10%至80%)。通过平均绝对误差对结果进行评估,新方法在71%的数据集上总体表现最佳,显著优于其他方法,尤其是对于高缺失率的情况。最后,还进行了一个使用心力衰竭数据的分类任务的案例研究,该方法在50%的分类器中带来了改进。

相似文献

1
Partial Multiple Imputation With Variational Autoencoders: Tackling Not at Randomness in Healthcare Data.使用变分自编码器的部分多重插补:应对医疗保健数据中的非随机缺失
IEEE J Biomed Health Inform. 2022 Aug;26(8):4218-4227. doi: 10.1109/JBHI.2022.3172656. Epub 2022 Aug 11.
2
Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors.Heckman 插补模型用于二分类或连续 MNAR 结局和 MAR 预测因子。
BMC Med Res Methodol. 2018 Aug 31;18(1):90. doi: 10.1186/s12874-018-0547-1.
3
A multiple imputation approach for MNAR mechanisms compatible with Heckman's model.一种与赫克曼模型兼容的非随机缺失机制的多重填补方法。
Stat Med. 2016 Jul 30;35(17):2907-20. doi: 10.1002/sim.6902. Epub 2016 Feb 18.
4
Non-linear missing data imputation for healthcare data via index-aware autoencoders.基于索引感知自动编码器的医疗保健数据非线性缺失数据插补。
Health Care Manag Sci. 2022 Sep;25(3):484-497. doi: 10.1007/s10729-022-09597-1. Epub 2022 Jun 23.
5
Outcome-sensitive multiple imputation: a simulation study.结果敏感多重填补:一项模拟研究。
BMC Med Res Methodol. 2017 Jan 9;17(1):2. doi: 10.1186/s12874-016-0281-5.
6
The multiple imputation method: a case study involving secondary data analysis.多重填补法:一项涉及二次数据分析的案例研究。
Nurse Res. 2015 May;22(5):13-9. doi: 10.7748/nr.22.5.13.e1319.
7
Multiple imputation with sequential penalized regression.多重插补与序贯惩罚回归。
Stat Methods Med Res. 2019 May;28(5):1311-1327. doi: 10.1177/0962280218755574. Epub 2018 Feb 16.
8
Multiple Imputation with Neural Network Gaussian Process for High-dimensional Incomplete Data.用于高维不完整数据的神经网络高斯过程多重插补
Proc Mach Learn Res. 2022 Dec;189:265-279.
9
Advanced methods for missing values imputation based on similarity learning.基于相似性学习的缺失值插补先进方法。
PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.
10
Multiple imputation: dealing with missing data.多重插补:处理缺失数据。
Nephrol Dial Transplant. 2013 Oct;28(10):2415-20. doi: 10.1093/ndt/gft221. Epub 2013 May 31.

引用本文的文献

1
Weighted-VAE: A deep learning approach for multimodal data generation applied to experimental T. cruzi infection.加权变分自编码器:一种应用于克氏锥虫实验性感染的多模态数据生成的深度学习方法。
PLoS One. 2025 Mar 24;20(3):e0315843. doi: 10.1371/journal.pone.0315843. eCollection 2025.
2
Conceptual framework as a guide to choose appropriate imputation method for missing values in a clinical structured dataset.概念框架作为选择临床结构化数据集中缺失值的适当插补方法的指南。
BMC Med Res Methodol. 2025 Feb 20;25(1):43. doi: 10.1186/s12874-025-02496-3.
3
Comparative study of imputation strategies to improve the sarcopenia prediction task.
用于改善肌肉减少症预测任务的插补策略的比较研究。
Digit Health. 2025 Jan 17;11:20552076241301960. doi: 10.1177/20552076241301960. eCollection 2025 Jan-Dec.
4
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.超越医学统计学:电子健康记录中缺失数据处理的系统评价
Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.
5
Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review.识别处理临床结构化数据集缺失值的最合适插补方法:系统评价。
BMC Med Res Methodol. 2024 Aug 28;24(1):188. doi: 10.1186/s12874-024-02310-6.
6
A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative.一种比较多重插补技术的方法:以美国国家 COVID 队列协作研究为例。
J Biomed Inform. 2023 Mar;139:104295. doi: 10.1016/j.jbi.2023.104295. Epub 2023 Jan 27.