Li Yukun, Pang Guansong, Suo Wei, Jing Chenchen, Xi Yuling, Liu Lingqiao, Chen Hao, Liang Guoqiang, Wang Peng
IEEE Trans Neural Netw Learn Syst. 2025 Aug;36(8):15137-15151. doi: 10.1109/TNNLS.2025.3547882.
This article investigates the problem of continual learning (CL) of vision-language models (VLMs) in open domains, where models are required to perform continual updating and inference on a stream of datasets from diverse seen and unseen domains with novel classes. Such a capability is crucial for various applications in open environments, e.g., AI assistants, autonomous driving systems, and robotics. Current CL studies mostly focus on closed-set scenarios in a single domain with known classes. Large pretrained VLMs such as CLIP have showcased exceptional zero-shot recognition capabilities, and several recent studies have leveraged the unique characteristics of VLMs to mitigate catastrophic forgetting in CL. However, they primarily focus on closed-set CL in a single-domain dataset. Open-domain CL of large VLMs is significantly more challenging due to 1) large class correlations and domain gaps across the datasets and 2) the forgetting of zero-shot knowledge in the pretrained VLMs and the knowledge learned from the newly adapted datasets. In this work, we introduce a novel approach, termed CoLeCLIP, which learns an open-domain CL model based on CLIP. It addresses these challenges through joint learning of a set of task prompts and a cross-domain class vocabulary. Extensive experiments on 11 domain datasets show that CoLeCLIP achieves new state-of-the-art performance for open-domain CL under both task- and class-incremental learning (CIL) settings.
本文研究了开放域中视觉语言模型(VLM)的持续学习(CL)问题,在这种情况下,模型需要对来自不同可见和不可见域且包含新类别的数据集流进行持续更新和推理。这种能力对于开放环境中的各种应用至关重要,例如人工智能助手、自动驾驶系统和机器人技术。当前的持续学习研究大多集中在具有已知类别的单个域中的封闭集场景。像CLIP这样的大型预训练视觉语言模型已经展示了出色的零样本识别能力,最近的一些研究利用视觉语言模型的独特特性来减轻持续学习中的灾难性遗忘。然而,它们主要关注单域数据集中的封闭集持续学习。大型视觉语言模型的开放域持续学习面临更大的挑战,原因如下:1)数据集之间存在很大的类别相关性和域差距;2)预训练视觉语言模型中的零样本知识以及从新适应数据集中学到的知识会被遗忘。在这项工作中,我们引入了一种名为CoLeCLIP的新方法,它基于CLIP学习一个开放域持续学习模型。它通过联合学习一组任务提示和跨域类别词汇表来应对这些挑战。在11个域数据集上进行的广泛实验表明,CoLeCLIP在任务增量学习和类别增量学习(CIL)设置下均实现了开放域持续学习的新的最优性能。