Suppr超能文献

针对学习偏差的大规模属性图上节点表示学习的标签反卷积

Label Deconvolution for Node Representation Learning on Large-Scale Attributed Graphs Against Learning Bias.

作者信息

Shi Zhihao, Wang Jie, Lu Fanghua, Chen Hanzhu, Lian Defu, Wang Zheng, Ye Jieping, Wu Feng

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11273-11286. doi: 10.1109/TPAMI.2024.3459408. Epub 2024 Nov 6.

Abstract

Node representation learning on attributed graphs-whose nodes are associated with rich attributes (e.g., texts and protein sequences)-plays a crucial role in many important downstream tasks. To encode the attributes and graph structures simultaneously, recent studies integrate pre-trained models with graph neural networks (GNNs), where pre-trained models serve as node encoders (NEs) to encode the attributes. As jointly training large NEs and GNNs on large-scale graphs suffers from severe scalability issues, many methods propose to train NEs and GNNs separately. Consequently, they do not take feature convolutions in GNNs into consideration in the training phase of NEs, leading to a significant learning bias relative to the joint training. To address this challenge, we propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. The inverse mapping leads to an objective function that is equivalent to that by the joint training, while it can effectively incorporate GNNs in the training phase of NEs against the learning bias. More importantly, we show that LD converges to the optimal objective function values by the joint training under mild assumptions. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph Benchmark datasets.

摘要

属性图上的节点表示学习——其节点与丰富的属性(如文本和蛋白质序列)相关联——在许多重要的下游任务中起着关键作用。为了同时编码属性和图结构,最近的研究将预训练模型与图神经网络(GNN)集成,其中预训练模型用作节点编码器(NE)来编码属性。由于在大规模图上联合训练大型NE和GNN存在严重的可扩展性问题,许多方法建议分别训练NE和GNN。因此,它们在NE的训练阶段没有考虑GNN中的特征卷积,导致相对于联合训练存在显著的学习偏差。为了应对这一挑战,我们提出了一种有效的标签正则化技术,即标签反卷积(LD),通过对GNN逆映射的一种新颖且高度可扩展的近似来减轻学习偏差。逆映射导致一个与联合训练等效的目标函数,同时它可以在NE的训练阶段有效地将GNN纳入以对抗学习偏差。更重要的是,我们表明在温和假设下,LD通过联合训练收敛到最优目标函数值。实验表明,LD在开放图基准数据集上显著优于现有方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验