Gou Jianping, Sun Liyuan, Yu Baosheng, Du Lan, Ramamohanarao Kotagiri, Tao Dacheng
IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6718-6730. doi: 10.1109/TNNLS.2022.3212733. Epub 2024 May 2.
Knowledge distillation (KD), as an efficient and effective model compression technique, has received considerable attention in deep learning. The key to its success is about transferring knowledge from a large teacher network to a small student network. However, most existing KD methods consider only one type of knowledge learned from either instance features or relations via a specific distillation strategy, failing to explore the idea of transferring different types of knowledge with different distillation strategies. Moreover, the widely used offline distillation also suffers from a limited learning capacity due to the fixed large-to-small teacher-student architecture. In this article, we devise a collaborative KD via multiknowledge transfer (CKD-MKT) that prompts both self-learning and collaborative learning in a unified framework. Specifically, CKD-MKT utilizes a multiple knowledge transfer framework that assembles self and online distillation strategies to effectively: 1) fuse different kinds of knowledge, which allows multiple students to learn knowledge from both individual instances and instance relations, and 2) guide each other by learning from themselves using collaborative and self-learning. Experiments and ablation studies on six image datasets demonstrate that the proposed CKD-MKT significantly outperforms recent state-of-the-art methods for KD.
知识蒸馏(KD)作为一种高效的模型压缩技术,在深度学习中受到了广泛关注。其成功的关键在于将知识从大型教师网络转移到小型学生网络。然而,大多数现有的KD方法仅通过特定的蒸馏策略考虑从实例特征或关系中学到的一种类型的知识,未能探索使用不同蒸馏策略转移不同类型知识的想法。此外,由于固定的从大到小的师生架构,广泛使用的离线蒸馏也存在学习能力有限的问题。在本文中,我们设计了一种通过多知识转移的协作式KD(CKD-MKT),它在一个统一的框架中促进自学习和协作学习。具体而言,CKD-MKT利用一个多知识转移框架,该框架整合了自蒸馏和在线蒸馏策略,以有效地:1)融合不同类型的知识,使多个学生能够从单个实例和实例关系中学习知识;2)通过协作学习和自学习从自身学习中相互指导。在六个图像数据集上进行的实验和消融研究表明,所提出的CKD-MKT显著优于最近最先进的KD方法。