Jiang Runqing, Yan Yan, Xue Jing-Hao, Chen Si, Wang Nannan, Wang Hanzi
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):939-952. doi: 10.1109/TNNLS.2023.3335829. Epub 2025 Jan 7.
Knowledge distillation (KD), which aims at transferring the knowledge from a complex network (a teacher) to a simpler and smaller network (a student), has received considerable attention in recent years. Typically, most existing KD methods work on well-labeled data. Unfortunately, real-world data often inevitably involve noisy labels, thus leading to performance deterioration of these methods. In this article, we study a little-explored but important issue, i.e., KD with noisy labels. To this end, we propose a novel KD method, called ambiguity-guided mutual label refinery KD (AML-KD), to train the student model in the presence of noisy labels. Specifically, based on the pretrained teacher model, a two-stage label refinery framework is innovatively introduced to refine labels gradually. In the first stage, we perform label propagation (LP) with small-loss selection guided by the teacher model, improving the learning capability of the student model. In the second stage, we perform mutual LP between the teacher and student models in a mutual-benefit way. During the label refinery, an ambiguity-aware weight estimation (AWE) module is developed to address the problem of ambiguous samples, avoiding overfitting these samples. One distinct advantage of AML-KD is that it is capable of learning a high-accuracy and low-cost student model with label noise. The experimental results on synthetic and real-world noisy datasets show the effectiveness of our AML-KD against state-of-the-art KD methods and label noise learning (LNL) methods. Code is available at https://github.com/Runqing-forMost/ AML-KD.
知识蒸馏(KD)旨在将知识从复杂网络(教师模型)转移到更简单、更小的网络(学生模型),近年来受到了广泛关注。通常,大多数现有的KD方法适用于标注良好的数据。不幸的是,现实世界的数据往往不可避免地包含噪声标签,从而导致这些方法的性能下降。在本文中,我们研究了一个较少被探索但很重要的问题,即带有噪声标签的KD。为此,我们提出了一种新颖的KD方法,称为模糊性引导的相互标签精炼KD(AML-KD),用于在存在噪声标签的情况下训练学生模型。具体来说,基于预训练的教师模型,创新性地引入了一个两阶段的标签精炼框架来逐步精炼标签。在第一阶段,我们在教师模型的引导下进行小损失选择的标签传播(LP),提高学生模型的学习能力。在第二阶段,我们以互利的方式在教师和学生模型之间进行相互LP。在标签精炼过程中,开发了一个模糊性感知权重估计(AWE)模块来解决模糊样本的问题,避免对这些样本过度拟合。AML-KD的一个显著优点是它能够在存在标签噪声的情况下学习高精度、低成本的学生模型。在合成和真实世界噪声数据集上的实验结果表明了我们的AML-KD相对于现有先进KD方法和标签噪声学习(LNL)方法的有效性。代码可在https://github.com/Runqing-forMost/ AML-KD获取。