Proto-Adapter：用于少样本图像分类的高效无需训练的CLIP-Adapter

Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.

作者信息

Kato Naoki, Nota Yoshiki, Aoki Yoshimitsu

机构信息

Department of Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan.

Meidensha Corporation, Tokyo 141-6029, Japan.

出版信息

Sensors (Basel). 2024 Jun 4;24(11):3624. doi: 10.3390/s24113624.

DOI:10.3390/s24113624

PMID:38894415

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11175357/

Abstract

Large vision-language models, such as Contrastive Vision-Language Pre-training (CLIP), pre-trained on large-scale image-text datasets, have demonstrated robust zero-shot transfer capabilities across various downstream tasks. To further enhance the few-shot recognition performance of CLIP, Tip-Adapter augments the CLIP model with an adapter that incorporates a key-value cache model constructed from the few-shot training set. This approach enables training-free adaptation and has shown significant improvements in few-shot recognition, especially with additional fine-tuning. However, the size of the adapter increases in proportion to the number of training samples, making it difficult to deploy in practical applications. In this paper, we propose a novel CLIP adaptation method, named Proto-Adapter, which employs a single-layer adapter of constant size regardless of the amount of training data and even outperforms Tip-Adapter. Proto-Adapter constructs the adapter's weights based on prototype representations for each class. By aggregating the features of the training samples, it successfully reduces the size of the adapter without compromising performance. Moreover, the performance of the model can be further enhanced by fine-tuning the adapter's weights using a distance margin penalty, which imposes additional inter-class discrepancy to the output logits. We posit that this training scheme allows us to obtain a model with a discriminative decision boundary even when trained with a limited amount of data. We demonstrate the effectiveness of the proposed method through extensive experiments of few-shot classification on diverse datasets.

摘要

大型视觉语言模型，如在大规模图像-文本数据集上进行预训练的对比视觉语言预训练模型（CLIP），已在各种下游任务中展现出强大的零样本迁移能力。为了进一步提升CLIP的少样本识别性能，Tip-Adapter通过一个适配器对CLIP模型进行增强，该适配器包含一个基于少样本训练集构建的键值缓存模型。这种方法实现了无需训练的自适应，并且在少样本识别方面有显著提升，特别是在进行额外的微调时。然而，适配器的大小与训练样本数量成比例增加，这使得在实际应用中难以部署。在本文中，我们提出了一种名为Proto-Adapter的新型CLIP自适应方法，它采用固定大小的单层适配器，无论训练数据量多少，甚至性能优于Tip-Adapter。Proto-Adapter基于每个类别的原型表示来构建适配器的权重。通过聚合训练样本的特征，它成功减小了适配器的大小而不影响性能。此外，通过使用距离边际惩罚对适配器的权重进行微调，可以进一步提升模型的性能，这会给输出对数its施加额外的类间差异。我们认为，即使在使用有限数据进行训练时，这种训练方案也能让我们获得具有判别性决策边界的模型。我们通过在不同数据集上进行的大量少样本分类实验证明了所提方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39b4/11175357/b8ddd6c961b7/sensors-24-03624-g001.jpg

相似文献

Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.Proto-Adapter：用于少样本图像分类的高效无需训练的CLIP-Adapter

Sensors (Basel). 2024 Jun 4;24(11):3624. doi: 10.3390/s24113624.

Enhancing Few-Shot CLIP With Semantic-Aware Fine-Tuning.通过语义感知微调增强少样本CLIP

IEEE Trans Neural Netw Learn Syst. 2024 Aug 26;PP. doi: 10.1109/TNNLS.2024.3443394.

Tuning Vision-Language Models With Multiple Prototypes Clustering.通过多原型聚类调整视觉语言模型

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11186-11199. doi: 10.1109/TPAMI.2024.3460180. Epub 2024 Nov 6.

CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation.用于少样本语义分割的基于CLIP的原型网络

Entropy (Basel). 2023 Sep 18;25(9):1353. doi: 10.3390/e25091353.

Utilizing Geographical Distribution Statistical Data to Improve Zero-Shot Species Recognition.利用地理分布统计数据改进零样本物种识别

Animals (Basel). 2024 Jun 7;14(12):1716. doi: 10.3390/ani14121716.

Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders.通过微调预训练的图像-文本编码器，显著提高零样本 X 射线病理学分类。

Sci Rep. 2024 Oct 5;14(1):23199. doi: 10.1038/s41598-024-73695-z.

One-Shot Adaptation of GAN in Just One CLIP.仅通过一个CLIP实现GAN的一次性适应。

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12179-12191. doi: 10.1109/TPAMI.2023.3283551. Epub 2023 Sep 5.

Few-shot disease recognition algorithm based on supervised contrastive learning.基于监督对比学习的少样本疾病识别算法

Front Plant Sci. 2024 Feb 7;15:1341831. doi: 10.3389/fpls.2024.1341831. eCollection 2024.

Open-Pose 3D zero-shot learning: Benchmark and challenges.开放姿态3D零样本学习：基准与挑战

Neural Netw. 2025 Jan;181:106775. doi: 10.1016/j.neunet.2024.106775. Epub 2024 Oct 9.

SCL: Self-supervised contrastive learning for few-shot image classification.SCL：基于自监督对比学习的少样本图像分类。

Neural Netw. 2023 Aug;165:19-30. doi: 10.1016/j.neunet.2023.05.037. Epub 2023 May 24.

Proto-Adapter：用于少样本图像分类的高效无需训练的CLIP-Adapter

Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.

作者信息

Kato Naoki, Nota Yoshiki, Aoki Yoshimitsu

机构信息

Department of Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Kanagawa, Japan.

Meidensha Corporation, Tokyo 141-6029, Japan.

出版信息

Sensors (Basel). 2024 Jun 4;24(11):3624. doi: 10.3390/s24113624.

DOI:10.3390/s24113624

PMID:38894415

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11175357/

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

Proto-Adapter：用于少样本图像分类的高效无需训练的CLIP-Adapter

Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.

作者信息

机构信息

出版信息

相似文献

Proto-Adapter：用于少样本图像分类的高效无需训练的CLIP-Adapter

Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.

作者信息

机构信息

出版信息

相似文献