Dai Zongyu, Bu Zhiqi, Long Qi
Department of AMCS, University of Pennsylvania, Philadelphia, USA.
Division of Biostatistics, University of Pennsylvania, Philadelphia, USA.
Proc Int Conf Mach Learn Appl. 2021 Dec;2021:791-798. doi: 10.1109/icmla52953.2021.00131.
Missing data are present in most real world problems and need careful handling to preserve the prediction accuracy and statistical consistency in the downstream analysis. As the gold standard of handling missing data, multiple imputation (MI) methods are proposed to account for the imputation uncertainty and provide proper statistical inference. In this work, we propose Multiple Imputation via Generative Adversarial Network (MI-GAN), a deep learning-based (in specific, a GAN-based) multiple imputation method, that can work under missing at random (MAR) mechanism with theoretical support. MI-GAN leverages recent progress in conditional generative adversarial neural works and shows strong performance matching existing state-of-the-art imputation methods on high-dimensional datasets, in terms of imputation error. In particular, MI-GAN significantly outperforms other imputation methods in the sense of statistical inference and computational speed.
缺失数据存在于大多数现实世界的问题中,需要谨慎处理,以在下游分析中保持预测准确性和统计一致性。作为处理缺失数据的金标准,人们提出了多重填补(MI)方法,以考虑填补的不确定性并提供适当的统计推断。在这项工作中,我们提出了基于生成对抗网络的多重填补(MI-GAN),这是一种基于深度学习(具体来说,基于GAN)的多重填补方法,它可以在随机缺失(MAR)机制下运行并有理论支持。MI-GAN利用了条件生成对抗神经网络的最新进展,在填补误差方面,在高维数据集上表现出与现有最先进的填补方法相匹配的强大性能。特别是,MI-GAN在统计推断和计算速度方面明显优于其他填补方法。