Suppr超能文献

深度神经网络自蒸馏利用数据表示不变性。

Deep Neural Network Self-Distillation Exploiting Data Representation Invariance.

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):257-269. doi: 10.1109/TNNLS.2020.3027634. Epub 2022 Jan 5.

Abstract

To harvest small networks with high accuracies, most existing methods mainly utilize compression techniques such as low-rank decomposition and pruning to compress a trained large model into a small network or transfer knowledge from a powerful large model (teacher) to a small network (student). Despite their success in generating small models of high performance, the dependence of accompanying assistive models complicates the training process and increases memory and time cost. In this article, we propose an elegant self-distillation (SD) mechanism to obtain high-accuracy models directly without going through an assistive model. Inspired by the invariant recognition in the human vision system, different distorted instances of the same input should possess similar high-level data representations. Thus, we can learn data representation invariance between different distorted versions of the same sample. Especially, in our learning algorithm based on SD, the single network utilizes the maximum mean discrepancy metric to learn the global feature consistency and the Kullback-Leibler divergence to constrain the posterior class probability consistency across the different distorted branches. Extensive experiments on MNIST, CIFAR-10/100, and ImageNet data sets demonstrate that the proposed method can effectively reduce the generalization error for various network architectures, such as AlexNet, VGGNet, ResNet, Wide ResNet, and DenseNet, and outperform existing model distillation methods with little extra training efforts.

摘要

为了从高精度的角度获取小型网络,大多数现有的方法主要利用压缩技术,如低秩分解和剪枝,将训练好的大型模型压缩成小型网络,或从强大的大型模型(教师)转移知识到小型网络(学生)。尽管它们在生成高性能的小型模型方面取得了成功,但伴随的辅助模型的依赖性使得训练过程复杂化,并增加了内存和时间成本。在本文中,我们提出了一种优雅的自蒸馏(SD)机制,可以直接获得高精度的模型,而无需经过辅助模型。受人类视觉系统中不变识别的启发,同一输入的不同失真实例应该具有相似的高级数据表示。因此,我们可以学习同一样本不同失真版本之间的数据表示不变性。特别是,在我们基于 SD 的学习算法中,单个网络利用最大均值差异度量来学习全局特征一致性,利用柯尔莫哥洛夫-斯米尔诺夫检验来约束不同失真分支之间的后验类概率一致性。在 MNIST、CIFAR-10/100 和 ImageNet 数据集上的广泛实验表明,该方法可以有效地降低各种网络架构(如 AlexNet、VGGNet、ResNet、Wide ResNet 和 DenseNet)的泛化误差,并在几乎不需要额外训练工作的情况下优于现有的模型蒸馏方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验