Suppr超能文献

关于图神经网络的表示知识蒸馏

On Representation Knowledge Distillation for Graph Neural Networks.

作者信息

Joshi Chaitanya K, Liu Fayao, Xun Xu, Lin Jie, Foo Chuan Sheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):4656-4667. doi: 10.1109/TNNLS.2022.3223018. Epub 2024 Apr 4.

Abstract

Knowledge distillation (KD) is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the local structure preserving (LSP) loss, which matches local structural relationships defined over edges across the student and teacher's node embeddings. This article studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose graph contrastive representation distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across four datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving (GSP) variant of LSP) as well as baselines from 2-D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other.

摘要

知识蒸馏(KD)是一种学习范式,用于使用更具表现力但更复杂的教师模型来提升资源高效的图神经网络(GNN)。过去关于GNN蒸馏的工作提出了局部结构保留(LSP)损失,该损失匹配了学生和教师节点嵌入中跨边定义的局部结构关系。本文研究了保留教师嵌入图数据的全局拓扑结构是否可以成为GNN更有效的蒸馏目标,因为现实世界的图通常包含潜在的相互作用和噪声边。我们提出了图对比表示蒸馏(G-CRD),它使用对比学习通过在共享表示空间中将学生节点嵌入与教师节点嵌入对齐来隐式保留全局拓扑结构。此外,我们在大规模现实世界数据集上引入了一组扩展的基准,其中教师和学生GNN之间的性能差距不可忽略。在四个数据集和14种异构GNN架构上的实验表明,G-CRD持续提升了轻量级GNN的性能和鲁棒性,优于LSP(以及LSP的全局结构保留(GSP)变体)以及二维计算机视觉的基线。对教师和学生嵌入空间之间的表示相似性的分析表明,G-CRD平衡了局部和全局关系的保留,而结构保留方法在保留其中一个或另一个方面表现最佳。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验