• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用定制教师改进知识蒸馏

Improving Knowledge Distillation With a Customized Teacher.

作者信息

Tan Chao, Liu Jie

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2290-2299. doi: 10.1109/TNNLS.2022.3189680. Epub 2024 Feb 5.

DOI:10.1109/TNNLS.2022.3189680
PMID:35877790
Abstract

Knowledge distillation (KD) is a widely used approach to transfer knowledge from a cumbersome network (also known as a teacher) to a lightweight network (also known as a student). However, even though the accuracies of different teachers are similar, the fixed student's accuracies are significantly different. We find that teachers with more dispersed secondary soft probabilities are more qualified to play their roles. Therefore, an indicator, i.e., the standard deviation σ of secondary soft probabilities, is introduced to choose the teacher. Moreover, to make a teacher's secondary soft probabilities more dispersed, a novel method, dubbed pretraining the teacher under dual supervision (PTDS), is proposed to pretrain a teacher under dual supervision. In addition, we put forward an asymmetrical transformation function (ATF) to further enhance the dispersion degree of the pretrained teachers' secondary soft probabilities. The combination of PTDS and ATF is termed knowledge distillation with a customized teacher (KDCT). Extensive empirical experiments and analyses are conducted on three computer vision tasks, including image classification, transfer learning, and semantic segmentation, to substantiate the effectiveness of KDCT.

摘要

知识蒸馏(KD)是一种广泛使用的方法,用于将知识从一个复杂的网络(也称为教师网络)转移到一个轻量级网络(也称为学生网络)。然而,尽管不同教师网络的准确率相似,但固定的学生网络的准确率却有显著差异。我们发现,具有更分散的二次软概率的教师网络更有资格发挥其作用。因此,引入了一个指标,即二次软概率的标准差σ,来选择教师网络。此外,为了使教师网络的二次软概率更加分散,提出了一种新的方法,称为双重监督下的教师预训练(PTDS),用于在双重监督下对教师网络进行预训练。此外,我们还提出了一种非对称变换函数(ATF),以进一步提高预训练教师网络二次软概率的分散程度。PTDS和ATF的组合被称为定制教师的知识蒸馏(KDCT)。我们在包括图像分类、迁移学习和语义分割在内的三个计算机视觉任务上进行了广泛的实证实验和分析,以证实KDCT的有效性。

相似文献

1
Improving Knowledge Distillation With a Customized Teacher.使用定制教师改进知识蒸馏
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2290-2299. doi: 10.1109/TNNLS.2022.3189680. Epub 2024 Feb 5.
2
FCKDNet: A Feature Condensation Knowledge Distillation Network for Semantic Segmentation.FCKDNet:一种用于语义分割的特征压缩知识蒸馏网络。
Entropy (Basel). 2023 Jan 7;25(1):125. doi: 10.3390/e25010125.
3
Teacher-student complementary sample contrastive distillation.师生互补样本对比蒸馏。
Neural Netw. 2024 Feb;170:176-189. doi: 10.1016/j.neunet.2023.11.036. Epub 2023 Nov 17.
4
Multi-teacher knowledge distillation based on joint Guidance of Probe and Adaptive Corrector.基于探针和自适应校正器联合引导的多教师知识蒸馏。
Neural Netw. 2023 Jul;164:345-356. doi: 10.1016/j.neunet.2023.04.015. Epub 2023 Apr 26.
5
Restructuring the Teacher and Student in Self-Distillation.在自蒸馏中重构教师与学生
IEEE Trans Image Process. 2024;33:5551-5563. doi: 10.1109/TIP.2024.3463421. Epub 2024 Oct 4.
6
RCKD: Response-Based Cross-Task Knowledge Distillation for Pathological Image Analysis.RCKD:用于病理图像分析的基于响应的跨任务知识蒸馏
Bioengineering (Basel). 2023 Nov 2;10(11):1279. doi: 10.3390/bioengineering10111279.
7
Generalized Knowledge Distillation via Relationship Matching.通过关系匹配实现广义知识蒸馏
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1817-1834. doi: 10.1109/TPAMI.2022.3160328. Epub 2023 Jan 6.
8
A General Dynamic Knowledge Distillation Method for Visual Analytics.一种用于视觉分析的通用动态知识蒸馏方法。
IEEE Trans Image Process. 2022 Oct 13;PP. doi: 10.1109/TIP.2022.3212905.
9
Memory-Replay Knowledge Distillation.记忆重放知识蒸馏。
Sensors (Basel). 2021 Apr 15;21(8):2792. doi: 10.3390/s21082792.
10
MSKD: Structured knowledge distillation for efficient medical image segmentation.MSKD:用于高效医学图像分割的结构化知识蒸馏。
Comput Biol Med. 2023 Sep;164:107284. doi: 10.1016/j.compbiomed.2023.107284. Epub 2023 Aug 2.

引用本文的文献

1
Uncertainty-aware Topological Persistence Guided Knowledge Distillation on Wearable Sensor Data.可穿戴传感器数据上的不确定性感知拓扑持久性引导知识蒸馏
IEEE Internet Things J. 2024 Sep 15;11(18):30413-30429. doi: 10.1109/jiot.2024.3412980. Epub 2024 Jun 11.