ResKD：残差引导知识蒸馏

ResKD: Residual-Guided Knowledge Distillation.

作者信息

Li Xuewei, Li Songyuan, Omar Bourahla, Wu Fei, Li Xi

出版信息

IEEE Trans Image Process. 2021;30:4735-4746. doi: 10.1109/TIP.2021.3066051. Epub 2021 May 5.

DOI:10.1109/TIP.2021.3066051

Abstract

Knowledge distillation, aimed at transferring the knowledge from a heavy teacher network to a lightweight student network, has emerged as a promising technique for compressing neural networks. However, due to the capacity gap between the heavy teacher and the lightweight student, there still exists a significant performance gap between them. In this article, we see knowledge distillation in a fresh light, using the knowledge gap, or the residual, between a teacher and a student as guidance to train a much more lightweight student, called a res-student. We combine the student and the res-student into a new student, where the res-student rectifies the errors of the former student. Such a residual-guided process can be repeated until the user strikes the balance between accuracy and cost. At inference time, we propose a sample-adaptive strategy to decide which res-students are not necessary for each sample, which can save computational cost. Experimental results show that we achieve competitive performance with 18.04%, 23.14%, 53.59%, and 56.86% of the teachers' computational costs on the CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet datasets. Finally, we do thorough theoretical and empirical analysis for our method.

摘要

知识蒸馏旨在将知识从大型教师网络转移到轻量级学生网络，已成为一种很有前景的压缩神经网络的技术。然而，由于大型教师网络和轻量级学生网络之间存在能力差距，它们之间仍然存在显著的性能差距。在本文中，我们以全新的视角看待知识蒸馏，利用教师网络和学生网络之间的知识差距（即残差）作为指导来训练一个更轻量级的学生网络，称为残差学生网络。我们将学生网络和残差学生网络合并为一个新的学生网络，其中残差学生网络纠正前一个学生网络的错误。这样的残差引导过程可以重复进行，直到用户在准确性和成本之间取得平衡。在推理阶段，我们提出了一种样本自适应策略，以确定每个样本不需要哪些残差学生网络，从而可以节省计算成本。实验结果表明，在CIFAR-10、CIFAR-100、Tiny-ImageNet和ImageNet数据集上，我们分别以教师网络计算成本的18.04%、23.14%、53.59%和56.86%实现了具有竞争力的性能。最后，我们对我们的方法进行了全面的理论和实证分析。

相似文献

ResKD: Residual-Guided Knowledge Distillation.

IEEE Trans Image Process. 2021;30:4735-4746. doi: 10.1109/TIP.2021.3066051. Epub 2021 May 5.

Highlight Every Step: Knowledge Distillation via Collaborative Teaching.

IEEE Trans Cybern. 2022 Apr;52(4):2070-2081. doi: 10.1109/TCYB.2020.3007506. Epub 2022 Apr 5.

DCCD: Reducing Neural Network Redundancy via Distillation.

IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):10006-10017. doi: 10.1109/TNNLS.2023.3238337. Epub 2024 Jul 8.

Restructuring the Teacher and Student in Self-Distillation.

IEEE Trans Image Process. 2024;33:5551-5563. doi: 10.1109/TIP.2024.3463421. Epub 2024 Oct 4.

Mitigating carbon footprint for knowledge distillation based deep learning model compression.

PLoS One. 2023 May 15;18(5):e0285668. doi: 10.1371/journal.pone.0285668. eCollection 2023.

Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector.

Neural Netw. 2023 Jul;164:345-356. doi: 10.1016/j.neunet.2023.04.015. Epub 2023 Apr 26.

STKD: Distilling Knowledge From Synchronous Teaching for Efficient Model Compression.

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10051-10064. doi: 10.1109/TNNLS.2022.3164264. Epub 2023 Nov 30.

Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy.

Sensors (Basel). 2020 Aug 17;20(16):4616. doi: 10.3390/s20164616.

Adversarial learning-based multi-level dense-transmission knowledge distillation for AP-ROP detection.

Med Image Anal. 2023 Feb;84:102725. doi: 10.1016/j.media.2022.102725. Epub 2022 Dec 9.

Lightweight Depth Completion Network with Local Similarity-Preserving Knowledge Distillation.

Sensors (Basel). 2022 Sep 28;22(19):7388. doi: 10.3390/s22197388.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ResKD：残差引导知识蒸馏

ResKD: Residual-Guided Knowledge Distillation.

作者信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献