• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Proto-Adapter:用于少样本图像分类的高效无需训练的CLIP-Adapter

Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.

作者信息

Kato Naoki, Nota Yoshiki, Aoki Yoshimitsu

机构信息

Department of Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan.

Meidensha Corporation, Tokyo 141-6029, Japan.

出版信息

Sensors (Basel). 2024 Jun 4;24(11):3624. doi: 10.3390/s24113624.

DOI:10.3390/s24113624
PMID:38894415
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11175357/
Abstract

Large vision-language models, such as Contrastive Vision-Language Pre-training (CLIP), pre-trained on large-scale image-text datasets, have demonstrated robust zero-shot transfer capabilities across various downstream tasks. To further enhance the few-shot recognition performance of CLIP, Tip-Adapter augments the CLIP model with an adapter that incorporates a key-value cache model constructed from the few-shot training set. This approach enables training-free adaptation and has shown significant improvements in few-shot recognition, especially with additional fine-tuning. However, the size of the adapter increases in proportion to the number of training samples, making it difficult to deploy in practical applications. In this paper, we propose a novel CLIP adaptation method, named Proto-Adapter, which employs a single-layer adapter of constant size regardless of the amount of training data and even outperforms Tip-Adapter. Proto-Adapter constructs the adapter's weights based on prototype representations for each class. By aggregating the features of the training samples, it successfully reduces the size of the adapter without compromising performance. Moreover, the performance of the model can be further enhanced by fine-tuning the adapter's weights using a distance margin penalty, which imposes additional inter-class discrepancy to the output logits. We posit that this training scheme allows us to obtain a model with a discriminative decision boundary even when trained with a limited amount of data. We demonstrate the effectiveness of the proposed method through extensive experiments of few-shot classification on diverse datasets.

摘要

大型视觉语言模型,如在大规模图像-文本数据集上进行预训练的对比视觉语言预训练模型(CLIP),已在各种下游任务中展现出强大的零样本迁移能力。为了进一步提升CLIP的少样本识别性能,Tip-Adapter通过一个适配器对CLIP模型进行增强,该适配器包含一个基于少样本训练集构建的键值缓存模型。这种方法实现了无需训练的自适应,并且在少样本识别方面有显著提升,特别是在进行额外的微调时。然而,适配器的大小与训练样本数量成比例增加,这使得在实际应用中难以部署。在本文中,我们提出了一种名为Proto-Adapter的新型CLIP自适应方法,它采用固定大小的单层适配器,无论训练数据量多少,甚至性能优于Tip-Adapter。Proto-Adapter基于每个类别的原型表示来构建适配器的权重。通过聚合训练样本的特征,它成功减小了适配器的大小而不影响性能。此外,通过使用距离边际惩罚对适配器的权重进行微调,可以进一步提升模型的性能,这会给输出对数its施加额外的类间差异。我们认为,即使在使用有限数据进行训练时,这种训练方案也能让我们获得具有判别性决策边界的模型。我们通过在不同数据集上进行的大量少样本分类实验证明了所提方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b4/11175357/9728d9c1498a/sensors-24-03624-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b4/11175357/b8ddd6c961b7/sensors-24-03624-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b4/11175357/7b46cd0c6324/sensors-24-03624-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b4/11175357/9728d9c1498a/sensors-24-03624-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b4/11175357/b8ddd6c961b7/sensors-24-03624-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b4/11175357/7b46cd0c6324/sensors-24-03624-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b4/11175357/9728d9c1498a/sensors-24-03624-g003.jpg

相似文献

1
Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.Proto-Adapter:用于少样本图像分类的高效无需训练的CLIP-Adapter
Sensors (Basel). 2024 Jun 4;24(11):3624. doi: 10.3390/s24113624.
2
Enhancing Few-Shot CLIP With Semantic-Aware Fine-Tuning.通过语义感知微调增强少样本CLIP
IEEE Trans Neural Netw Learn Syst. 2024 Aug 26;PP. doi: 10.1109/TNNLS.2024.3443394.
3
Tuning Vision-Language Models With Multiple Prototypes Clustering.通过多原型聚类调整视觉语言模型
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11186-11199. doi: 10.1109/TPAMI.2024.3460180. Epub 2024 Nov 6.
4
CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation.用于少样本语义分割的基于CLIP的原型网络
Entropy (Basel). 2023 Sep 18;25(9):1353. doi: 10.3390/e25091353.
5
Utilizing Geographical Distribution Statistical Data to Improve Zero-Shot Species Recognition.利用地理分布统计数据改进零样本物种识别
Animals (Basel). 2024 Jun 7;14(12):1716. doi: 10.3390/ani14121716.
6
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders.通过微调预训练的图像-文本编码器,显著提高零样本 X 射线病理学分类。
Sci Rep. 2024 Oct 5;14(1):23199. doi: 10.1038/s41598-024-73695-z.
7
One-Shot Adaptation of GAN in Just One CLIP.仅通过一个CLIP实现GAN的一次性适应。
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12179-12191. doi: 10.1109/TPAMI.2023.3283551. Epub 2023 Sep 5.
8
Few-shot disease recognition algorithm based on supervised contrastive learning.基于监督对比学习的少样本疾病识别算法
Front Plant Sci. 2024 Feb 7;15:1341831. doi: 10.3389/fpls.2024.1341831. eCollection 2024.
9
Open-Pose 3D zero-shot learning: Benchmark and challenges.开放姿态3D零样本学习:基准与挑战
Neural Netw. 2025 Jan;181:106775. doi: 10.1016/j.neunet.2024.106775. Epub 2024 Oct 9.
10
SCL: Self-supervised contrastive learning for few-shot image classification.SCL:基于自监督对比学习的少样本图像分类。
Neural Netw. 2023 Aug;165:19-30. doi: 10.1016/j.neunet.2023.05.037. Epub 2023 May 24.