National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China.
National University of Defense Technology, Changsha 410000, China; State Key Laboratory of Complex & Critical Software Environment, Changsha 410000, China.
Neural Netw. 2024 Nov;179:106513. doi: 10.1016/j.neunet.2024.106513. Epub 2024 Jul 6.
Class-Incremental learning (CIL) is challenging due to catastrophic forgetting (CF), which escalates in exemplar-free scenarios. To mitigate CF, Knowledge Distillation (KD), which leverages old models as teacher models, has been widely employed in CIL. However, based on a case study, our investigation reveals that the teacher model exhibits over-confidence in unseen new samples. In this article, we conduct empirical experiments and provide theoretical analysis to investigate the over-confident phenomenon and the impact of KD in exemplar-free CIL, where access to old samples is unavailable. Building on our analysis, we propose a novel approach, Learning with Humbler Teacher, by systematically selecting an appropriate checkpoint model as a humbler teacher to mitigate CF. Furthermore, we explore utilizing the nuclear norm to obtain an appropriate temporal ensemble to enhance model stability. Notably, LwHT outperforms the state-of-the-art approach by a significant margin of 10.41%, 6.56%, and 4.31% in various settings while demonstrating superior model plasticity.
类增量学习(CIL)由于灾难性遗忘(CF)而具有挑战性,在无范例场景中,CF 会加剧。为了减轻 CF,知识蒸馏(KD)被广泛应用于 CIL 中,它利用旧模型作为教师模型。然而,根据一项案例研究,我们的调查揭示了教师模型对未见的新样本表现出过度自信。在本文中,我们进行了实证实验,并提供了理论分析,以研究无范例 CIL 中过度自信的现象和 KD 的影响,在这种情况下,无法访问旧样本。基于我们的分析,我们提出了一种新的方法,即使用更谦虚的教师进行学习(Learning with Humbler Teacher,LwHT),通过系统地选择适当的检查点模型作为更谦虚的教师来减轻 CF。此外,我们还探索利用核范数获得适当的时间集成,以增强模型稳定性。值得注意的是,LwHT 在各种设置下的表现均优于最先进的方法,其优势分别为 10.41%、6.56%和 4.31%,同时还表现出更高的模型可塑性。