Suppr超能文献

基于对抗训练图卷积网络的缺失数据插补。

Missing data imputation with adversarially-trained graph convolutional networks.

机构信息

Department of Information Engineering, Electronics and Telecommunications (DIET), Sapienza University of Rome, Via Eudossiana 18, 00184 Rome, Italy.

Department of Information Engineering, Electronics and Telecommunications (DIET), Sapienza University of Rome, Via Eudossiana 18, 00184 Rome, Italy.

出版信息

Neural Netw. 2020 Sep;129:249-260. doi: 10.1016/j.neunet.2020.06.005. Epub 2020 Jun 13.

Abstract

Missing data imputation (MDI) is the task of replacing missing values in a dataset with alternative, predicted ones. Because of the widespread presence of missing data, it is a fundamental problem in many scientific disciplines. Popular methods for MDI use global statistics computed from the entire dataset (e.g., the feature-wise medians), or build predictive models operating independently on every instance. In this paper we propose a more general framework for MDI, leveraging recent work in the field of graph neural networks (GNNs). We formulate the MDI task in terms of a graph denoising autoencoder, where each edge of the graph encodes the similarity between two patterns. A GNN encoder learns to build intermediate representations for each example by interleaving classical projection layers and locally combining information between neighbors, while another decoding GNN learns to reconstruct the full imputed dataset from this intermediate embedding. In order to speed-up training and improve the performance, we use a combination of multiple losses, including an adversarial loss implemented with the Wasserstein metric and a gradient penalty. We also explore a few extensions to the basic architecture involving the use of residual connections between layers, and of global statistics computed from the dataset to improve the accuracy. On a large experimental evaluation with varying levels of artificial noise, we show that our method is on par or better than several alternative imputation methods. On three datasets with pre-existing missing values, we show that our method is robust to the choice of a downstream classifier, obtaining similar or slightly higher results compared to other choices.

摘要

缺失数据填补(MDI)是用替代值来替换数据集中缺失值的任务。由于缺失数据的广泛存在,它是许多科学领域的一个基本问题。MDI 的常用方法使用从整个数据集计算的全局统计信息(例如,特征中位数),或构建在每个实例上独立运行的预测模型。在本文中,我们提出了一种更通用的 MDI 框架,利用图神经网络(GNN)领域的最新工作。我们将 MDI 任务表述为图去噪自动编码器,其中图的每条边编码两个模式之间的相似度。GNN 编码器通过交错经典投影层和在邻居之间局部组合信息来学习为每个示例构建中间表示,而另一个解码 GNN 则学习从该中间嵌入中重建完整的填补数据集。为了加快训练速度和提高性能,我们使用了多种损失的组合,包括使用 Wasserstein 度量实现的对抗性损失和梯度惩罚。我们还探索了几种基本架构的扩展,包括在层之间使用残差连接,以及从数据集计算全局统计信息以提高准确性。在具有不同程度人为噪声的大型实验评估中,我们表明我们的方法与几种替代填补方法相当或更好。在三个具有预先存在的缺失值的数据集上,我们表明我们的方法对下游分类器的选择具有鲁棒性,与其他选择相比,我们的方法获得了相似或略高的结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验