Feng Kaituo, Miao Yikun, Li Changsheng, Yuan Ye, Wang Guoren
IEEE Trans Pattern Anal Mach Intell. 2025 Jun;47(6):4377-4394. doi: 10.1109/TPAMI.2025.3543211. Epub 2025 May 7.
Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. Our core idea is to collaboratively learn two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often exhibits better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that involves two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. Additionally, considering that different augmented graphs can potentially capture distinct perspectives or representations of the graph data, we propose FreeKD-Prompt that learns undistorted and diverse augmentations based on prompt learning for exchanging varied knowledge. Furthermore, instead of confining knowledge exchange within two GNNs, we develop FreeKD++ and FreeKD-Prompt++ to enable free-direction knowledge transfer among multiple shallow GNNs. Extensive experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs by a large margin, and show their efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.
知识蒸馏(KD)已被证明能有效地提升图神经网络(GNN)的性能,其典型目标是将深层教师GNN中的知识蒸馏到较浅层的学生GNN中。然而,由于众所周知的过参数化和过平滑问题,训练一个令人满意的深层GNN通常颇具挑战性,这导致在实际应用中知识转移无效。在本文中,我们提出了首个用于GNN的基于强化学习的自由方向知识蒸馏框架,称为FreeKD,它不再需要提供一个经过深度优化的深层教师GNN。我们的核心思想是协同学习两个较浅层的GNN,以便通过强化学习以分层方式在它们之间交换知识。由于我们观察到一个典型的GNN模型在训练过程中在不同节点上常常表现出优劣不同的性能,我们设计了一种动态的自由方向知识转移策略,该策略涉及两个层次的动作:1)节点级动作确定两个网络相应节点之间的知识转移方向;然后2)结构级动作确定要传播的由节点级动作生成的局部结构中的哪一个。此外,考虑到不同的增强图可能潜在地捕捉到图数据的不同视角或表示,我们提出了FreeKD-Prompt,它基于提示学习来学习无失真且多样的增强,以交换不同的知识。此外,我们不是将知识交换局限在两个GNN之间,而是开发了FreeKD++和FreeKD-Prompt++,以实现多个浅层GNN之间的自由方向知识转移。在五个基准数据集上进行的大量实验表明,我们的方法比基础GNN有大幅提升,并展示了它们对各种GNN的有效性。更令人惊讶的是,我们的FreeKD与从更深更强的教师GNN中蒸馏知识的传统KD算法相比,具有相当甚至更好的性能。