Suppr超能文献

解耦图知识蒸馏:一种基于对数的在图上学习 MLP 的通用方法。

Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs.

机构信息

School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100190, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing, 100190, China.

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing, 100190, China.

出版信息

Neural Netw. 2024 Nov;179:106567. doi: 10.1016/j.neunet.2024.106567. Epub 2024 Jul 23.

Abstract

While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN's prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https://github.com/xsk160/DGKD.

摘要

虽然图神经网络(GNN)在处理非欧几里得结构数据方面表现出了有效性,但 GNN 的邻居获取过程既耗时又计算密集,这使得它们难以部署在低延迟的工业应用中。为了解决这个问题,一种可行的解决方案是图知识蒸馏(KD),它可以通过模仿教师 GNN 的优异输出,学习高性能的学生多层感知机(MLP)来替代 GNN。然而,目前最先进的图知识蒸馏方法主要基于从中间隐藏层中提取深度特征,这导致对数层蒸馏的重要性被大大忽视。为了为研究基于对数的 KD 方法提供新的视角,我们将解耦的思想引入图知识蒸馏中。具体来说,我们首先将经典的图知识蒸馏损失重新表述为两部分,即目标类图蒸馏(TCGD)损失和非目标类图蒸馏(NCGD)损失。接下来,我们解耦了 GNN 预测置信度与 NCGD 损失之间的负相关性,并消除了 TCGD 和 NCGD 之间的固定权重。我们将这种基于对数的方法命名为解耦图知识蒸馏(DGKD)。它可以灵活地为不同的数据样本调整 TCGD 和 NCGD 的权重,从而提高学生 MLP 的预测精度。在公共基准数据集上进行的广泛实验表明了我们方法的有效性。此外,DGKD 可以作为一个即插即用的损失函数集成到任何现有的图知识蒸馏框架中,进一步提高蒸馏性能。代码可在 https://github.com/xsk160/DGKD 获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验