University of Pisa, Largo B. Pontecorvo, 3, Pisa, 56127, Italy.
University of Pisa, Largo B. Pontecorvo, 3, Pisa, 56127, Italy.
Neural Netw. 2024 Nov;179:106492. doi: 10.1016/j.neunet.2024.106492. Epub 2024 Jul 1.
Pre-trained models are commonly used in Continual Learning to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during Continual Learning. We investigate the characteristics of the Continual Pre-Training scenario, where a model is continually pre-trained on a stream of incoming data and only later fine-tuned to different downstream tasks. We introduce an evaluation protocol for Continual Pre-Training which monitors forgetting against a Forgetting Control dataset not present in the continual stream. We disentangle the impact on forgetting of 3 main factors: the input modality (NLP, Vision), the architecture type (Transformer, ResNet) and the pre-training protocol (supervised, self-supervised). Moreover, we propose a Sample-Efficient Pre-training method (SEP) that speeds up the pre-training phase. We show that the pre-training protocol is the most important factor accounting for forgetting. Surprisingly, we discovered that self-supervised continual pre-training in both NLP and Vision is sufficient to mitigate forgetting without the use of any Continual Learning strategy. Other factors, like model depth, input modality and architecture type are not as crucial.
预训练模型常用于连续学习中,在对非平稳数据流进行训练之前对模型进行初始化。然而,预训练在连续学习中很少被应用。我们研究了连续预训练场景的特点,其中模型在输入数据流上不断进行预训练,然后再对不同的下游任务进行微调。我们引入了一种连续预训练的评估协议,该协议监测遗忘情况,同时使用不在连续流中的遗忘控制数据集进行评估。我们分解了 3 个主要因素对遗忘的影响:输入模态(NLP、视觉)、架构类型(Transformer、ResNet)和预训练协议(监督、自监督)。此外,我们提出了一种高效预训练方法(SEP),可以加快预训练阶段。我们发现预训练协议是导致遗忘的最重要因素。令人惊讶的是,我们发现无论是在 NLP 还是视觉领域,自监督的连续预训练足以减轻遗忘,而无需使用任何连续学习策略。其他因素,如模型深度、输入模态和架构类型,并不那么关键。