基于知识蒸馏的持续学习：一项综述。

Continual Learning With Knowledge Distillation: A Survey.

作者信息

Li Songze, Su Tonghua, Zhang Xu-Yao, Wang Zhongjie

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Oct 18;PP. doi: 10.1109/TNNLS.2024.3476068.

DOI:10.1109/TNNLS.2024.3476068

Abstract

The foremost challenge in continual learning is to mitigate catastrophic forgetting, allowing a model to retain knowledge of previous tasks while learning new tasks. Knowledge distillation (KD), a form of regularization, has gained significant attention for its ability to maintain a model's performance on previous tasks by mimicking the outputs of earlier models during the learning of new tasks, thus reducing forgetting. This article offers a comprehensive survey of continual learning methods employing KD within the realm of image classification. We provide a detailed analysis of how KD is utilized in continual learning methods, categorizing its application into three distinct paradigms. Besides, we classify these methods based on the type of knowledge source used and thoroughly examine how KD consolidates memory in continual learning from the perspective of loss functions. In addition, we have conducted extensive experiments on CIFAR-100, TinyImageNet, and ImageNet-100 across ten KD-integrated continual learning methods to analyze the role of KD in continual learning, and we have further discussed its effectiveness in other continual learning tasks. Our extensive experimental evidence demonstrates that KD plays a crucial role in mitigating forgetting in continual learning and substantiates that, when used with data replay, classification bias adversely affects the effectiveness of KD, whereas employing a separated softmax loss can significantly enhance its efficacy.

摘要

持续学习面临的首要挑战是减轻灾难性遗忘，使模型在学习新任务时能够保留对先前任务的知识。知识蒸馏（KD）作为一种正则化形式，因其在学习新任务时通过模仿早期模型的输出以保持模型在先前任务上的性能，从而减少遗忘的能力而备受关注。本文对图像分类领域中采用KD的持续学习方法进行了全面综述。我们详细分析了KD在持续学习方法中的应用方式，将其应用分为三种不同的范式。此外，我们根据所使用的知识源类型对这些方法进行分类，并从损失函数的角度深入研究KD在持续学习中如何巩固记忆。此外，我们针对十种集成KD的持续学习方法在CIFAR-100、TinyImageNet和ImageNet-100上进行了广泛实验，以分析KD在持续学习中的作用，并进一步讨论了其在其他持续学习任务中的有效性。我们广泛的实验证据表明，KD在减轻持续学习中的遗忘方面起着关键作用，并证实与数据重放一起使用时，分类偏差会对KD的有效性产生不利影响，而采用单独的softmax损失可以显著提高其功效。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于知识蒸馏的持续学习：一项综述。

Continual Learning With Knowledge Distillation: A Survey.

作者信息

出版信息

相似文献

引用本文的文献

基于知识蒸馏的持续学习：一项综述。

Continual Learning With Knowledge Distillation: A Survey.

作者信息

出版信息

相似文献

引用本文的文献