Suppr超能文献

通过知识对齐与关联的双层次知识蒸馏

Dual-Level Knowledge Distillation via Knowledge Alignment and Correlation.

作者信息

Ding Fei, Yang Yin, Hu Hongxin, Krovi Venkat, Luo Feng

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2425-2435. doi: 10.1109/TNNLS.2022.3190166. Epub 2024 Feb 5.

Abstract

Knowledge distillation (KD) has become a widely used technique for model compression and knowledge transfer. We find that the standard KD method performs the knowledge alignment on an individual sample indirectly via class prototypes and neglects the structural knowledge between different samples, namely, knowledge correlation. Although recent contrastive learning-based distillation methods can be decomposed into knowledge alignment and correlation, their correlation objectives undesirably push apart representations of samples from the same class, leading to inferior distillation results. To improve the distillation performance, in this work, we propose a novel knowledge correlation objective and introduce the dual-level knowledge distillation (DLKD), which explicitly combines knowledge alignment and correlation together instead of using one single contrastive objective. We show that both knowledge alignment and correlation are necessary to improve the distillation performance. In particular, knowledge correlation can serve as an effective regularization to learn generalized representations. The proposed DLKD is task-agnostic and model-agnostic, and enables effective knowledge transfer from supervised or self-supervised pretrained teachers to students. Experiments show that DLKD outperforms other state-of-the-art methods on a large number of experimental settings including: 1) pretraining strategies; 2) network architectures; 3) datasets; and 4) tasks.

摘要

知识蒸馏(KD)已成为一种广泛应用于模型压缩和知识转移的技术。我们发现,标准的KD方法通过类原型间接对单个样本进行知识对齐,而忽略了不同样本之间的结构知识,即知识相关性。尽管最近基于对比学习的蒸馏方法可以分解为知识对齐和相关性,但它们的相关性目标会不期望地将来自同一类别的样本表示推开,导致蒸馏结果较差。为了提高蒸馏性能,在这项工作中,我们提出了一种新颖的知识相关性目标,并引入了双层知识蒸馏(DLKD),它将知识对齐和相关性明确地结合在一起,而不是使用单一的对比目标。我们表明,知识对齐和相关性对于提高蒸馏性能都是必要的。特别是,知识相关性可以作为一种有效的正则化来学习广义表示。所提出的DLKD与任务无关且与模型无关,并能够实现从有监督或自监督预训练教师到学生的有效知识转移。实验表明,在大量实验设置上,DLKD优于其他现有方法,包括:1)预训练策略;2)网络架构;3)数据集;以及4)任务。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验