在自蒸馏中重构教师与学生

Restructuring the Teacher and Student in Self-Distillation.

作者信息

Zheng Yujie, Wang Chong, Tao Chenchen, Lin Sunqi, Qian Jiangbo, Wu Jiafei

出版信息

IEEE Trans Image Process. 2024;33:5551-5563. doi: 10.1109/TIP.2024.3463421. Epub 2024 Oct 4.

DOI:10.1109/TIP.2024.3463421

Abstract

Knowledge distillation aims to achieve model compression by transferring knowledge from complex teacher models to lightweight student models. To reduce reliance on pre-trained teacher models, self-distillation methods utilize knowledge from the model itself as additional supervision. However, their performance is limited by the same or similar network architecture between the teacher and student. In order to increase architecture variety, we propose a new self-distillation framework called restructured self-distillation (RSD), which involves restructuring both the teacher and student networks. The self-distilled model is expanded into a multi-branch topology to create a more powerful teacher. During training, diverse student sub-networks are generated by randomly discarding the teacher's branches. Additionally, the teacher and student models are linked by a randomly inserted feature mixture block, introducing additional knowledge distillation in the mixed feature space. To avoid extra inference costs, the branches of the teacher model are then converted back to its original structure equivalently. Comprehensive experiments have demonstrated the effectiveness of our proposed framework for most architectures on CIFAR-10/100 and ImageNet datasets. Code is available at https://github.com/YujieZheng99/RSD.

摘要

知识蒸馏旨在通过将知识从复杂的教师模型转移到轻量级的学生模型来实现模型压缩。为了减少对预训练教师模型的依赖，自蒸馏方法利用模型自身的知识作为额外的监督。然而，它们的性能受到教师和学生之间相同或相似网络架构的限制。为了增加架构的多样性，我们提出了一种新的自蒸馏框架，称为重构自蒸馏（RSD），它涉及对教师和学生网络进行重构。自蒸馏模型被扩展为多分支拓扑结构以创建一个更强大的教师。在训练过程中，通过随机丢弃教师的分支来生成不同的学生子网。此外，教师和学生模型通过一个随机插入的特征混合块相连，在混合特征空间中引入额外的知识蒸馏。为了避免额外的推理成本，然后将教师模型的分支等效地转换回其原始结构。综合实验证明了我们提出的框架对于CIFAR-10/100和ImageNet数据集上的大多数架构的有效性。代码可在https://github.com/YujieZheng99/RSD获取。

相似文献

Restructuring the Teacher and Student in Self-Distillation.在自蒸馏中重构教师与学生

IEEE Trans Image Process. 2024;33:5551-5563. doi: 10.1109/TIP.2024.3463421. Epub 2024 Oct 4.

STKD: Distilling Knowledge From Synchronous Teaching for Efficient Model Compression.STKD：从同步教学中提炼知识以实现高效模型压缩

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10051-10064. doi: 10.1109/TNNLS.2022.3164264. Epub 2023 Nov 30.

Teacher-student complementary sample contrastive distillation.师生互补样本对比蒸馏。

Neural Netw. 2024 Feb;170:176-189. doi: 10.1016/j.neunet.2023.11.036. Epub 2023 Nov 17.

DCCD: Reducing Neural Network Redundancy via Distillation.DCCD：通过蒸馏减少神经网络冗余

IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):10006-10017. doi: 10.1109/TNNLS.2023.3238337. Epub 2024 Jul 8.

ResKD: Residual-Guided Knowledge Distillation.ResKD：残差引导知识蒸馏

IEEE Trans Image Process. 2021;30:4735-4746. doi: 10.1109/TIP.2021.3066051. Epub 2021 May 5.

MSKD: Structured knowledge distillation for efficient medical image segmentation.MSKD：用于高效医学图像分割的结构化知识蒸馏。

Comput Biol Med. 2023 Sep;164:107284. doi: 10.1016/j.compbiomed.2023.107284. Epub 2023 Aug 2.

Memory-Replay Knowledge Distillation.记忆重放知识蒸馏。

Sensors (Basel). 2021 Apr 15;21(8):2792. doi: 10.3390/s21082792.

On Representation Knowledge Distillation for Graph Neural Networks.关于图神经网络的表示知识蒸馏

IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):4656-4667. doi: 10.1109/TNNLS.2022.3223018. Epub 2024 Apr 4.

Sci Rep. 2024 Aug 14;14(1):18888. doi: 10.1038/s41598-024-69813-6.

Multi-view Teacher-Student Network.多视角师生网络。

Neural Netw. 2022 Feb;146:69-84. doi: 10.1016/j.neunet.2021.11.002. Epub 2021 Nov 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在自蒸馏中重构教师与学生

Restructuring the Teacher and Student in Self-Distillation.

作者信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献