• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度神经网络自蒸馏利用数据表示不变性。

Deep Neural Network Self-Distillation Exploiting Data Representation Invariance.

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):257-269. doi: 10.1109/TNNLS.2020.3027634. Epub 2022 Jan 5.

DOI:10.1109/TNNLS.2020.3027634
PMID:33074828
Abstract

To harvest small networks with high accuracies, most existing methods mainly utilize compression techniques such as low-rank decomposition and pruning to compress a trained large model into a small network or transfer knowledge from a powerful large model (teacher) to a small network (student). Despite their success in generating small models of high performance, the dependence of accompanying assistive models complicates the training process and increases memory and time cost. In this article, we propose an elegant self-distillation (SD) mechanism to obtain high-accuracy models directly without going through an assistive model. Inspired by the invariant recognition in the human vision system, different distorted instances of the same input should possess similar high-level data representations. Thus, we can learn data representation invariance between different distorted versions of the same sample. Especially, in our learning algorithm based on SD, the single network utilizes the maximum mean discrepancy metric to learn the global feature consistency and the Kullback-Leibler divergence to constrain the posterior class probability consistency across the different distorted branches. Extensive experiments on MNIST, CIFAR-10/100, and ImageNet data sets demonstrate that the proposed method can effectively reduce the generalization error for various network architectures, such as AlexNet, VGGNet, ResNet, Wide ResNet, and DenseNet, and outperform existing model distillation methods with little extra training efforts.

摘要

为了从高精度的角度获取小型网络,大多数现有的方法主要利用压缩技术,如低秩分解和剪枝,将训练好的大型模型压缩成小型网络,或从强大的大型模型(教师)转移知识到小型网络(学生)。尽管它们在生成高性能的小型模型方面取得了成功,但伴随的辅助模型的依赖性使得训练过程复杂化,并增加了内存和时间成本。在本文中,我们提出了一种优雅的自蒸馏(SD)机制,可以直接获得高精度的模型,而无需经过辅助模型。受人类视觉系统中不变识别的启发,同一输入的不同失真实例应该具有相似的高级数据表示。因此,我们可以学习同一样本不同失真版本之间的数据表示不变性。特别是,在我们基于 SD 的学习算法中,单个网络利用最大均值差异度量来学习全局特征一致性,利用柯尔莫哥洛夫-斯米尔诺夫检验来约束不同失真分支之间的后验类概率一致性。在 MNIST、CIFAR-10/100 和 ImageNet 数据集上的广泛实验表明,该方法可以有效地降低各种网络架构(如 AlexNet、VGGNet、ResNet、Wide ResNet 和 DenseNet)的泛化误差,并在几乎不需要额外训练工作的情况下优于现有的模型蒸馏方法。

相似文献

1
Deep Neural Network Self-Distillation Exploiting Data Representation Invariance.深度神经网络自蒸馏利用数据表示不变性。
IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):257-269. doi: 10.1109/TNNLS.2020.3027634. Epub 2022 Jan 5.
2
Restructuring the Teacher and Student in Self-Distillation.在自蒸馏中重构教师与学生
IEEE Trans Image Process. 2024;33:5551-5563. doi: 10.1109/TIP.2024.3463421. Epub 2024 Oct 4.
3
Complementary label learning based on knowledge distillation.基于知识蒸馏的互补标签学习。
Math Biosci Eng. 2023 Sep 19;20(10):17905-17918. doi: 10.3934/mbe.2023796.
4
Memory-Replay Knowledge Distillation.记忆重放知识蒸馏。
Sensors (Basel). 2021 Apr 15;21(8):2792. doi: 10.3390/s21082792.
5
DCCD: Reducing Neural Network Redundancy via Distillation.DCCD:通过蒸馏减少神经网络冗余
IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):10006-10017. doi: 10.1109/TNNLS.2023.3238337. Epub 2024 Jul 8.
6
Adaptive Search-and-Training for Robust and Efficient Network Pruning.用于鲁棒且高效网络剪枝的自适应搜索与训练
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9325-9338. doi: 10.1109/TPAMI.2023.3248612. Epub 2023 Jun 30.
7
Self-knowledge distillation for surgical phase recognition.手术阶段识别的自我知识蒸馏。
Int J Comput Assist Radiol Surg. 2024 Jan;19(1):61-68. doi: 10.1007/s11548-023-02970-7. Epub 2023 Jun 20.
8
DMPP: Differentiable multi-pruner and predictor for neural network pruning.DMPP:用于神经网络剪枝的可微分多修剪器和预测器。
Neural Netw. 2022 Mar;147:103-112. doi: 10.1016/j.neunet.2021.12.020. Epub 2021 Dec 30.
9
Weak sub-network pruning for strong and efficient neural networks.弱子网络剪枝技术:构建强大而高效的神经网络
Neural Netw. 2021 Dec;144:614-626. doi: 10.1016/j.neunet.2021.09.015. Epub 2021 Sep 30.
10
Attention Inspiring Receptive-Fields Network for Learning Invariant Representations.用于学习不变表示的注意力激发感受野网络
IEEE Trans Neural Netw Learn Syst. 2019 Jun;30(6):1744-1755. doi: 10.1109/TNNLS.2018.2873722. Epub 2018 Oct 26.

引用本文的文献

1
A multicenter proof-of-concept study on deep learning-based intraoperative discrimination of primary central nervous system lymphoma.基于深度学习的中枢神经系统原发性淋巴瘤术中鉴别诊断的多中心概念验证研究。
Nat Commun. 2024 May 4;15(1):3768. doi: 10.1038/s41467-024-48171-x.