Suppr超能文献

基于生成对抗网络的多重插补法解决高维分块缺失值问题

Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems.

作者信息

Dai Zongyu, Bu Zhiqi, Long Qi

机构信息

Department of AMCS, University of Pennsylvania, Philadelphia, USA.

Division of Biostatistics, University of Pennsylvania, Philadelphia, USA.

出版信息

Proc Int Conf Mach Learn Appl. 2021 Dec;2021:791-798. doi: 10.1109/icmla52953.2021.00131.

Abstract

Missing data are present in most real world problems and need careful handling to preserve the prediction accuracy and statistical consistency in the downstream analysis. As the gold standard of handling missing data, multiple imputation (MI) methods are proposed to account for the imputation uncertainty and provide proper statistical inference. In this work, we propose Multiple Imputation via Generative Adversarial Network (MI-GAN), a deep learning-based (in specific, a GAN-based) multiple imputation method, that can work under missing at random (MAR) mechanism with theoretical support. MI-GAN leverages recent progress in conditional generative adversarial neural works and shows strong performance matching existing state-of-the-art imputation methods on high-dimensional datasets, in terms of imputation error. In particular, MI-GAN significantly outperforms other imputation methods in the sense of statistical inference and computational speed.

摘要

缺失数据存在于大多数现实世界的问题中,需要谨慎处理,以在下游分析中保持预测准确性和统计一致性。作为处理缺失数据的金标准,人们提出了多重填补(MI)方法,以考虑填补的不确定性并提供适当的统计推断。在这项工作中,我们提出了基于生成对抗网络的多重填补(MI-GAN),这是一种基于深度学习(具体来说,基于GAN)的多重填补方法,它可以在随机缺失(MAR)机制下运行并有理论支持。MI-GAN利用了条件生成对抗神经网络的最新进展,在填补误差方面,在高维数据集上表现出与现有最先进的填补方法相匹配的强大性能。特别是,MI-GAN在统计推断和计算速度方面明显优于其他填补方法。

相似文献

1
Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems.
Proc Int Conf Mach Learn Appl. 2021 Dec;2021:791-798. doi: 10.1109/icmla52953.2021.00131.
4
A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks.
Comput Biol Med. 2024 Jan;168:107687. doi: 10.1016/j.compbiomed.2023.107687. Epub 2023 Nov 14.
5
Electronic medical records imputation by temporal Generative Adversarial Network.
BioData Min. 2024 Jun 26;17(1):19. doi: 10.1186/s13040-024-00372-2.
6
PC-GAIN: Pseudo-label conditional generative adversarial imputation networks for incomplete data.
Neural Netw. 2021 Sep;141:395-403. doi: 10.1016/j.neunet.2021.05.033. Epub 2021 Jun 2.
7
A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets.
Comput Biol Med. 2023 Sep;163:107188. doi: 10.1016/j.compbiomed.2023.107188. Epub 2023 Jun 22.
9
DeepMicroGen: a generative adversarial network-based method for longitudinal microbiome data imputation.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad286.
10
Generative adversarial networks for imputing missing data for big data clinical research.
BMC Med Res Methodol. 2021 Apr 20;21(1):78. doi: 10.1186/s12874-021-01272-3.

引用本文的文献

2
3
MISNN: Multiple Imputation via Semi-parametric Neural Networks.
Adv Knowl Discov Data Min. 2023 May;13935:430-442. doi: 10.1007/978-3-031-33374-3_34. Epub 2023 May 27.
4
Evaluating the risk of endometriosis based on patients' self-assessment questionnaires.
Reprod Biol Endocrinol. 2023 Oct 28;21(1):102. doi: 10.1186/s12958-023-01156-9.

本文引用的文献

1
Inference and uncertainty quantification for noisy matrix completion.
Proc Natl Acad Sci U S A. 2019 Nov 12;116(46):22931-22937. doi: 10.1073/pnas.1910053116. Epub 2019 Oct 30.
3
An imputation-regularized optimization algorithm for high dimensional missing data problems and beyond.
J R Stat Soc Series B Stat Methodol. 2018 Nov;80(5):899-926. doi: 10.1111/rssb.12279. Epub 2018 Jun 25.
5
Multiple imputation in the presence of high-dimensional data.
Stat Methods Med Res. 2016 Oct;25(5):2021-2035. doi: 10.1177/0962280213511027. Epub 2013 Nov 25.
6
MissForest--non-parametric missing value imputation for mixed-type data.
Bioinformatics. 2012 Jan 1;28(1):112-8. doi: 10.1093/bioinformatics/btr597. Epub 2011 Oct 28.
8
Multiple imputation of discrete and continuous data by fully conditional specification.
Stat Methods Med Res. 2007 Jun;16(3):219-42. doi: 10.1177/0962280206074463.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验