Guizhou University, School of Mechanical Engineering, Guiyang, 550025, Guizhou, China.
Guizhou University, School of Mechanical Engineering, Guiyang, 550025, Guizhou, China; Guizhou University, State Key Laboratory of Public Big Data Ministry of Education, Guiyang, 550025, Guizhou, China.
Neural Netw. 2024 Dec;180:106685. doi: 10.1016/j.neunet.2024.106685. Epub 2024 Aug 31.
Humans have the ability to constantly learn new knowledge. However, for artificial intelligence, trying to continuously learn new knowledge usually results in catastrophic forgetting, the existing regularization-based and dynamic structure-based approaches have shown great potential for alleviating. Nevertheless, these approaches have certain limitations. They usually do not fully consider the problem of incompatible feature embeddings. Instead, they tend to focus only on the features of new or previous classes and fail to comprehensively consider the entire model. Therefore, we propose a two-stage learning paradigm to solve feature embedding incompatibility problems. Specifically, we retain the previous model and freeze all its parameters in the first stage while dynamically expanding a new module to alleviate feature embedding incompatibility questions. In the second stage, a fusion knowledge distillation approach is used to compress the redundant feature dimensions. Moreover, we propose weight pruning and consolidation approaches to improve the efficiency of the model. Our experimental results obtained on the CIFAR-100, ImageNet-100 and ImageNet-1000 benchmark datasets show that the proposed approaches achieve the best performance among all the compared approaches. For example, on the ImageNet-100 dataset, the maximal accuracy improvement is 5.08%. Code is available at https://github.com/ybyangjing/CIL-FCE.
人类具有不断学习新知识的能力。然而,对于人工智能来说,尝试不断学习新知识通常会导致灾难性遗忘,现有的基于正则化和动态结构的方法已经显示出了很大的缓解潜力。然而,这些方法存在一定的局限性。它们通常不能完全考虑不兼容特征嵌入的问题。相反,它们往往只关注新的或以前的类别的特征,而不能全面考虑整个模型。因此,我们提出了一种两阶段学习范式来解决特征嵌入不兼容问题。具体来说,我们在第一阶段保留之前的模型,并冻结其所有参数,同时动态扩展一个新的模块来缓解特征嵌入不兼容问题。在第二阶段,采用融合知识蒸馏方法来压缩冗余的特征维度。此外,我们提出了权重剪枝和整合方法来提高模型的效率。我们在 CIFAR-100、ImageNet-100 和 ImageNet-1000 基准数据集上的实验结果表明,所提出的方法在所有比较方法中取得了最佳性能。例如,在 ImageNet-100 数据集上,最大的准确率提高了 5.08%。代码可在 https://github.com/ybyangjing/CIL-FCE 上获得。