Li Xiaorong, Wang Shipeng, Sun Jian, Xu Zongben
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12618-12634. doi: 10.1109/TPAMI.2023.3271626. Epub 2023 Sep 5.
Deep neural networks suffer from catastrophic forgetting when trained on sequential tasks in continual learning. Various methods rely on storing data of previous tasks to mitigate catastrophic forgetting, which is prohibited in real-world applications considering privacy and security issues. In this paper, we consider a realistic setting of continual learning, where training data of previous tasks are unavailable and memory resources are limited. We contribute a novel knowledge distillation-based method in an information-theoretic framework by maximizing mutual information between outputs of previously learned and current networks. Due to the intractability of computation of mutual information, we instead maximize its variational lower bound, where the covariance of variational distribution is modeled by a graph convolutional network. The inaccessibility of data of previous tasks is tackled by Taylor expansion, yielding a novel regularizer in network training loss for continual learning. The regularizer relies on compressed gradients of network parameters. It avoids storing previous task data and previously learned networks. Additionally, we employ self-supervised learning technique for learning effective features, which improves the performance of continual learning. We conduct extensive experiments including image classification and semantic segmentation, and the results show that our method achieves state-of-the-art performance on continual learning benchmarks.
深度神经网络在持续学习中对序列任务进行训练时会遭受灾难性遗忘。各种方法依赖于存储先前任务的数据来减轻灾难性遗忘,然而考虑到隐私和安全问题,这在实际应用中是被禁止的。在本文中,我们考虑了一种现实的持续学习场景,即先前任务的训练数据不可用且内存资源有限。我们在信息论框架下提出了一种基于知识蒸馏的新颖方法,通过最大化先前学习网络和当前网络输出之间的互信息来实现。由于互信息计算的难处理性,我们转而最大化其变分下界,其中变分分布的协方差由图卷积网络建模。通过泰勒展开解决了先前任务数据不可访问的问题,从而在持续学习的网络训练损失中产生了一种新颖的正则化器。该正则化器依赖于网络参数的压缩梯度。它避免了存储先前任务数据和先前学习的网络。此外,我们采用自监督学习技术来学习有效特征,这提高了持续学习的性能。我们进行了包括图像分类和语义分割在内的广泛实验,结果表明我们的方法在持续学习基准测试中达到了当前最优性能。