• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过多原型聚类调整视觉语言模型

Tuning Vision-Language Models With Multiple Prototypes Clustering.

作者信息

Guo Meng-Hao, Zhang Yi, Mu Tai-Jiang, Huang Sharon X, Hu Shi-Min

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11186-11199. doi: 10.1109/TPAMI.2024.3460180. Epub 2024 Nov 6.

DOI:10.1109/TPAMI.2024.3460180
PMID:39269797
Abstract

Benefiting from advances in large-scale pre-training, foundation models, have demonstrated remarkable capability in the fields of natural language processing, computer vision, among others. However, to achieve expert-level performance in specific applications, such models often need to be fine-tuned with domain-specific knowledge. In this paper, we focus on enabling vision-language models to unleash more potential for visual understanding tasks under few-shot tuning. Specifically, we propose a novel adapter, dubbed as lusterAdapter, which is based on trainable multiple prototypes clustering algorithm, for tuning the CLIP model. It can not only alleviate the concern of catastrophic forgetting of foundation models by introducing anchors to inherit common knowledge, but also improve the utilization efficiency of few annotated samples via bringing in clustering and domain priors, thereby improving the performance of few-shot tuning. We have conducted extensive experiments on 11 common classification benchmarks. The results show our method significantly surpasses the original CLIP and achieves state-of-the-art (SOTA) performance under all benchmarks and settings. For example, under the 16-shot setting, our method exhibits a remarkable improvement over the original CLIP by 19.6%, and also surpasses TIP-Adapter and GraphAdapter by 2.7% and 2.2%, respectively, in terms of average accuracy across the 11 benchmarks.

摘要

受益于大规模预训练的进展,基础模型在自然语言处理、计算机视觉等领域展现出了卓越的能力。然而,为了在特定应用中实现专家级性能,此类模型通常需要使用特定领域的知识进行微调。在本文中,我们专注于使视觉语言模型在少样本微调下释放更多视觉理解任务的潜力。具体而言,我们提出了一种新颖的适配器,称为lusterAdapter,它基于可训练的多原型聚类算法,用于微调CLIP模型。它不仅可以通过引入锚点来继承常识,缓解基础模型灾难性遗忘的问题,还可以通过引入聚类和领域先验来提高少量标注样本的利用效率,从而提高少样本微调的性能。我们在11个常见分类基准上进行了广泛的实验。结果表明,我们的方法显著超越了原始的CLIP,并在所有基准和设置下都达到了当前最优(SOTA)性能。例如,在16样本设置下,我们的方法相对于原始CLIP有显著提升,提升了19.6%,并且在11个基准的平均准确率方面,分别比TIP-Adapter和GraphAdapter高出2.7%和2.2%。

相似文献

1
Tuning Vision-Language Models With Multiple Prototypes Clustering.通过多原型聚类调整视觉语言模型
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11186-11199. doi: 10.1109/TPAMI.2024.3460180. Epub 2024 Nov 6.
2
Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.Proto-Adapter:用于少样本图像分类的高效无需训练的CLIP-Adapter
Sensors (Basel). 2024 Jun 4;24(11):3624. doi: 10.3390/s24113624.
3
Enhancing Few-Shot CLIP With Semantic-Aware Fine-Tuning.通过语义感知微调增强少样本CLIP
IEEE Trans Neural Netw Learn Syst. 2024 Aug 26;PP. doi: 10.1109/TNNLS.2024.3443394.
4
Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.使用暹罗神经网络的临床自然语言处理少样本学习:算法开发与验证研究
JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.
5
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders.通过微调预训练的图像-文本编码器,显著提高零样本 X 射线病理学分类。
Sci Rep. 2024 Oct 5;14(1):23199. doi: 10.1038/s41598-024-73695-z.
6
Enhancing Few-Shot Out-of-Distribution Detection With Pre-Trained Model Features.利用预训练模型特征增强少样本分布外检测
IEEE Trans Image Process. 2024;33:6309-6323. doi: 10.1109/TIP.2024.3468874. Epub 2024 Dec 27.
7
Embedded prompt tuning: Towards enhanced calibration of pretrained models for medical images.嵌入式提示调整:增强医学图像预训练模型校准的新途径。
Med Image Anal. 2024 Oct;97:103258. doi: 10.1016/j.media.2024.103258. Epub 2024 Jul 4.
8
Learning Domain Invariant Prompt for Vision-Language Models.用于视觉语言模型的学习领域不变提示
IEEE Trans Image Process. 2024;33:1348-1360. doi: 10.1109/TIP.2024.3362062. Epub 2024 Feb 14.
9
The impact of fine-tuning paradigms on unknown plant diseases recognition.微调范式对未知植物病害识别的影响。
Sci Rep. 2024 Aug 2;14(1):17900. doi: 10.1038/s41598-024-66958-2.
10
OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models.OpenMedLM:在使用开源大语言模型进行医学问答时,基于提示的工程学可以胜过微调。
Sci Rep. 2024 Jun 19;14(1):14156. doi: 10.1038/s41598-024-64827-6.