用于视觉语言模型的学习领域不变提示

Learning Domain Invariant Prompt for Vision-Language Models.

作者信息

Zhao Cairong, Wang Yubin, Jiang Xinyang, Shen Yifei, Song Kaitao, Li Dongsheng, Miao Duoqian

出版信息

IEEE Trans Image Process. 2024;33:1348-1360. doi: 10.1109/TIP.2024.3362062. Epub 2024 Feb 14.

DOI:10.1109/TIP.2024.3362062

Abstract

Prompt learning stands out as one of the most efficient approaches for adapting powerful vision-language foundational models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, despite its success in achieving remarkable performance on in-domain data, prompt learning still faces the significant challenge of effectively generalizing to novel classes and domains. Some existing methods address this concern by dynamically generating distinct prompts for different domains. Yet, they overlook the inherent potential of prompts to generalize across unseen domains. To address these limitations, our study introduces an innovative prompt learning paradigm, called MetaPrompt, aiming to directly learn domain invariant prompt in few-shot scenarios. To facilitate learning prompts for image and text inputs independently, we present a dual-modality prompt tuning network comprising two pairs of coupled encoders. Our study centers on an alternate episodic training algorithm to enrich the generalization capacity of the learned prompts. In contrast to traditional episodic training algorithms, our approach incorporates both in-domain updates and domain-split updates in a batch-wise manner. For in-domain updates, we introduce a novel asymmetric contrastive learning paradigm, where representations from the pre-trained encoder assume supervision to regularize prompts from the prompted encoder. To enhance performance on out-of-domain distribution, we propose a domain-split optimization on visual prompts for cross-domain tasks or textual prompts for cross-class tasks during domain-split updates. Extensive experiments across 11 datasets for base-to-new generalization and 4 datasets for domain generalization exhibit favorable performance. Compared with the state-of-the-art method, MetaPrompt achieves an absolute gain of 1.02% on the overall harmonic mean in base-to-new generalization and consistently demonstrates superiority over all benchmarks in domain generalization.

摘要

提示学习是一种最有效的方法之一，通过用极少的样本调整可学习的提示向量，使像CLIP这样强大的视觉语言基础模型适应下游数据集。然而，尽管它在域内数据上取得了显著性能，但提示学习在有效泛化到新类别和新领域方面仍面临重大挑战。一些现有方法通过为不同领域动态生成不同的提示来解决这一问题。然而，它们忽略了提示在跨未见领域泛化的内在潜力。为了解决这些局限性，我们的研究引入了一种创新的提示学习范式，称为元提示，旨在在少样本场景中直接学习域不变提示。为了便于独立学习图像和文本输入的提示，我们提出了一种双模态提示调整网络，它由两对耦合编码器组成。我们的研究集中在一种交替情节训练算法上，以增强所学习提示的泛化能力。与传统的情节训练算法不同，我们的方法以批处理方式结合了域内更新和域分割更新。对于域内更新，我们引入了一种新颖的不对称对比学习范式，其中来自预训练编码器的表示承担监督作用，以规范来自提示编码器的提示。为了提高域外分布上的性能，我们在域分割更新期间针对跨域任务的视觉提示或跨类任务的文本提示提出了一种域分割优化方法。在11个用于基到新泛化的数据集和4个用于域泛化的数据集上进行的广泛实验显示出良好的性能。与最先进的方法相比，元提示在基到新泛化的总体调和均值上实现了1.02%的绝对增益，并且在域泛化方面始终优于所有基准。

相似文献

Learning Domain Invariant Prompt for Vision-Language Models.用于视觉语言模型的学习领域不变提示

IEEE Trans Image Process. 2024;33:1348-1360. doi: 10.1109/TIP.2024.3362062. Epub 2024 Feb 14.

MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL：用于医学视觉语言模型的多模态协作提示学习

IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.

Adapting Vision-Language Models via Learning to Inject Knowledge.通过学习注入知识来适配视觉语言模型。

IEEE Trans Image Process. 2024;33:5798-5809. doi: 10.1109/TIP.2024.3468884. Epub 2024 Oct 15.

An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估：算法开发与验证研究

JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.

Prompt-and-Transfer: Dynamic Class-Aware Enhancement for Few-Shot Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):131-148. doi: 10.1109/TPAMI.2024.3461779. Epub 2024 Dec 4.

Prompt-Driven Latent Domain Generalization for Medical Image Classification.用于医学图像分类的提示驱动潜在域泛化

IEEE Trans Med Imaging. 2025 Jan;44(1):348-360. doi: 10.1109/TMI.2024.3443119. Epub 2025 Jan 2.

Brain-Inspired Fast-and Slow-Update Prompt Tuning for Few-Shot Class-Incremental Learning.用于少样本类别增量学习的脑启发式快速和慢速更新提示调优

IEEE Trans Neural Netw Learn Syst. 2024 Sep 18;PP. doi: 10.1109/TNNLS.2024.3454237.

Fine-Grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection.用于开放词汇目标检测的细粒度视觉文本提示驱动自训练

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16277-16287. doi: 10.1109/TNNLS.2023.3293484. Epub 2024 Oct 29.

Domain Adaptation via Prompt Learning.通过提示学习进行域适应

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):1160-1170. doi: 10.1109/TNNLS.2023.3327962. Epub 2025 Jan 7.

Utilizing Geographical Distribution Statistical Data to Improve Zero-Shot Species Recognition.利用地理分布统计数据改进零样本物种识别

Animals (Basel). 2024 Jun 7;14(12):1716. doi: 10.3390/ani14121716.

引用本文的文献

Prompt engineering for digital mental health: a short review.数字心理健康的提示工程：简要综述。

Front Digit Health. 2024 Jun 12;6:1410947. doi: 10.3389/fdgth.2024.1410947. eCollection 2024.

用于视觉语言模型的学习领域不变提示

Learning Domain Invariant Prompt for Vision-Language Models.

作者信息

Zhao Cairong, Wang Yubin, Jiang Xinyang, Shen Yifei, Song Kaitao, Li Dongsheng, Miao Duoqian

出版信息

IEEE Trans Image Process. 2024;33:1348-1360. doi: 10.1109/TIP.2024.3362062. Epub 2024 Feb 14.

DOI:10.1109/TIP.2024.3362062

PMID:38335087

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于视觉语言模型的学习领域不变提示

Learning Domain Invariant Prompt for Vision-Language Models.

作者信息

出版信息

相似文献

引用本文的文献

用于视觉语言模型的学习领域不变提示

Learning Domain Invariant Prompt for Vision-Language Models.

作者信息

出版信息

相似文献

引用本文的文献