Li Chuan, Teng Xiao, Ding Yan, Lan Long
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China.
Sensors (Basel). 2024 Jun 3;24(11):3617. doi: 10.3390/s24113617.
Most logit-based knowledge distillation methods transfer soft labels from the teacher model to the student model via Kullback-Leibler divergence based on softmax, an exponential normalization function. However, this exponential nature of softmax tends to prioritize the largest class (target class) while neglecting smaller ones (non-target classes), leading to an oversight of the non-target classes's significance. To address this issue, we propose Non-Target-Class-Enhanced Knowledge Distillation (NTCE-KD) to amplify the role of non-target classes both in terms of magnitude and diversity. Specifically, we present a magnitude-enhanced Kullback-Leibler (MKL) divergence multi-shrinking the target class to enhance the impact of non-target classes in terms of magnitude. Additionally, to enrich the diversity of non-target classes, we introduce a diversity-based data augmentation strategy (DDA), further enhancing overall performance. Extensive experimental results on the CIFAR-100 and ImageNet-1k datasets demonstrate that non-target classes are of great significance and that our method achieves state-of-the-art performance across a wide range of teacher-student pairs.
大多数基于逻辑回归的知识蒸馏方法通过基于softmax(一种指数归一化函数)的库尔贝克-莱布勒散度,将教师模型的软标签传递给学生模型。然而,softmax的这种指数性质倾向于优先考虑最大的类别(目标类别),而忽略较小的类别(非目标类别),从而导致对非目标类别的重要性的忽视。为了解决这个问题,我们提出了非目标类增强知识蒸馏(NTCE-KD),以在幅度和多样性方面放大非目标类别的作用。具体来说,我们提出了一种幅度增强的库尔贝克-莱布勒(MKL)散度,多次收缩目标类别,以在幅度方面增强非目标类别的影响。此外,为了丰富非目标类别的多样性,我们引入了一种基于多样性的数据增强策略(DDA),进一步提高整体性能。在CIFAR-100和ImageNet-1k数据集上的大量实验结果表明,非目标类别具有重要意义,并且我们的方法在广泛的教师-学生对中实现了领先的性能。