• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ResKD:残差引导知识蒸馏

ResKD: Residual-Guided Knowledge Distillation.

作者信息

Li Xuewei, Li Songyuan, Omar Bourahla, Wu Fei, Li Xi

出版信息

IEEE Trans Image Process. 2021;30:4735-4746. doi: 10.1109/TIP.2021.3066051. Epub 2021 May 5.

DOI:10.1109/TIP.2021.3066051
PMID:33739924
Abstract

Knowledge distillation, aimed at transferring the knowledge from a heavy teacher network to a lightweight student network, has emerged as a promising technique for compressing neural networks. However, due to the capacity gap between the heavy teacher and the lightweight student, there still exists a significant performance gap between them. In this article, we see knowledge distillation in a fresh light, using the knowledge gap, or the residual, between a teacher and a student as guidance to train a much more lightweight student, called a res-student. We combine the student and the res-student into a new student, where the res-student rectifies the errors of the former student. Such a residual-guided process can be repeated until the user strikes the balance between accuracy and cost. At inference time, we propose a sample-adaptive strategy to decide which res-students are not necessary for each sample, which can save computational cost. Experimental results show that we achieve competitive performance with 18.04%, 23.14%, 53.59%, and 56.86% of the teachers' computational costs on the CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet datasets. Finally, we do thorough theoretical and empirical analysis for our method.

摘要

知识蒸馏旨在将知识从大型教师网络转移到轻量级学生网络,已成为一种很有前景的压缩神经网络的技术。然而,由于大型教师网络和轻量级学生网络之间存在能力差距,它们之间仍然存在显著的性能差距。在本文中,我们以全新的视角看待知识蒸馏,利用教师网络和学生网络之间的知识差距(即残差)作为指导来训练一个更轻量级的学生网络,称为残差学生网络。我们将学生网络和残差学生网络合并为一个新的学生网络,其中残差学生网络纠正前一个学生网络的错误。这样的残差引导过程可以重复进行,直到用户在准确性和成本之间取得平衡。在推理阶段,我们提出了一种样本自适应策略,以确定每个样本不需要哪些残差学生网络,从而可以节省计算成本。实验结果表明,在CIFAR-10、CIFAR-100、Tiny-ImageNet和ImageNet数据集上,我们分别以教师网络计算成本的18.04%、23.14%、53.59%和56.86%实现了具有竞争力的性能。最后,我们对我们的方法进行了全面的理论和实证分析。

相似文献

1
ResKD: Residual-Guided Knowledge Distillation.ResKD:残差引导知识蒸馏
IEEE Trans Image Process. 2021;30:4735-4746. doi: 10.1109/TIP.2021.3066051. Epub 2021 May 5.
2
Highlight Every Step: Knowledge Distillation via Collaborative Teaching.突出每个步骤:通过协作教学进行知识蒸馏。
IEEE Trans Cybern. 2022 Apr;52(4):2070-2081. doi: 10.1109/TCYB.2020.3007506. Epub 2022 Apr 5.
3
DCCD: Reducing Neural Network Redundancy via Distillation.DCCD:通过蒸馏减少神经网络冗余
IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):10006-10017. doi: 10.1109/TNNLS.2023.3238337. Epub 2024 Jul 8.
4
Restructuring the Teacher and Student in Self-Distillation.在自蒸馏中重构教师与学生
IEEE Trans Image Process. 2024;33:5551-5563. doi: 10.1109/TIP.2024.3463421. Epub 2024 Oct 4.
5
Mitigating carbon footprint for knowledge distillation based deep learning model compression.减轻基于知识蒸馏的深度学习模型压缩的碳足迹。
PLoS One. 2023 May 15;18(5):e0285668. doi: 10.1371/journal.pone.0285668. eCollection 2023.
6
Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector.基于探针和自适应校正器联合引导的多教师知识蒸馏。
Neural Netw. 2023 Jul;164:345-356. doi: 10.1016/j.neunet.2023.04.015. Epub 2023 Apr 26.
7
STKD: Distilling Knowledge From Synchronous Teaching for Efficient Model Compression.STKD:从同步教学中提炼知识以实现高效模型压缩
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10051-10064. doi: 10.1109/TNNLS.2022.3164264. Epub 2023 Nov 30.
8
Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy.基于通道与空间相关性及自适应交叉熵的语义分割知识蒸馏
Sensors (Basel). 2020 Aug 17;20(16):4616. doi: 10.3390/s20164616.
9
Adversarial learning-based multi-level dense-transmission knowledge distillation for AP-ROP detection.基于对抗学习的多级密集传输知识蒸馏用于早产儿视网膜病变(AP-ROP)检测
Med Image Anal. 2023 Feb;84:102725. doi: 10.1016/j.media.2022.102725. Epub 2022 Dec 9.
10
Lightweight Depth Completion Network with Local Similarity-Preserving Knowledge Distillation.具有局部相似性保持知识蒸馏的轻量级深度补全网络
Sensors (Basel). 2022 Sep 28;22(19):7388. doi: 10.3390/s22197388.