• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过语义感知微调增强少样本CLIP

Enhancing Few-Shot CLIP With Semantic-Aware Fine-Tuning.

作者信息

Zhu Yao, Chen Yuefeng, Mao Xiaofeng, Yan Xiu, Wang Yue, Lu Wang, Wang Jindong, Ji Xiangyang

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Aug 26;PP. doi: 10.1109/TNNLS.2024.3443394.

DOI:10.1109/TNNLS.2024.3443394
PMID:39186419
Abstract

Learning generalized representations from limited training samples is crucial for applying deep neural networks in low-resource scenarios. Recently, methods based on contrastive language-image pretraining (CLIP) have exhibited promising performance in few-shot adaptation tasks. To avoid catastrophic forgetting and overfitting caused by few-shot fine-tuning, existing works usually freeze the parameters of CLIP pretrained on large-scale datasets, overlooking the possibility that some parameters might not be suitable for downstream tasks. To this end, we revisit CLIP's visual encoder with a specific focus on its distinctive attention pooling layer, which performs a spatial weighted-sum of the dense feature maps. Given that dense feature maps contain meaningful semantic information, and different semantics hold varying importance for diverse downstream tasks (such as prioritizing semantics like ears and eyes in pet classification tasks rather than side mirrors), using the same weighted-sum operation for dense features across different few-shot tasks might not be appropriate. Hence, we propose fine-tuning the parameters of the attention pooling layer during the training process to encourage the model to focus on task-specific semantics. In the inference process, we perform residual blending between the features pooled by the fine-tuned and the original attention pooling layers to incorporate both the few-shot knowledge and the pretrained CLIP's prior knowledge. We term this method as semantic-aware fine-tuning (). is effective in enhancing the conventional few-shot CLIP and is compatible with the existing adapter approach (termed ). Extensive experiments on 11 benchmarks demonstrate that both and significantly outperform the second-best method by 1.51 and 2.38 in the one-shot setting and by 0.48 and 1.37 in the four-shot setting, respectively.

摘要

从有限的训练样本中学习通用表示对于在低资源场景中应用深度神经网络至关重要。最近,基于对比语言-图像预训练(CLIP)的方法在少样本适应任务中表现出了有前景的性能。为了避免少样本微调导致的灾难性遗忘和过拟合,现有工作通常冻结在大规模数据集上预训练的CLIP的参数,而忽略了一些参数可能不适用于下游任务的可能性。为此,我们重新审视CLIP的视觉编码器,特别关注其独特的注意力池化层,该层对密集特征图执行空间加权求和。鉴于密集特征图包含有意义的语义信息,并且不同的语义对于不同的下游任务具有不同的重要性(例如在宠物分类任务中优先考虑耳朵和眼睛等语义而不是侧后视镜),在不同的少样本任务中对密集特征使用相同的加权求和操作可能不合适。因此,我们建议在训练过程中微调注意力池化层的参数,以鼓励模型关注特定任务的语义。在推理过程中,我们在微调后的注意力池化层和原始注意力池化层池化的特征之间进行残差融合,以结合少样本知识和预训练的CLIP的先验知识。我们将此方法称为语义感知微调()。该方法在增强传统的少样本CLIP方面是有效的,并且与现有的适配器方法(称为)兼容。在11个基准上进行的大量实验表明,在单样本设置下,和分别比第二好的方法显著高出1.51和2.38,在四样本设置下分别高出0.48和1.37。

相似文献

1
Enhancing Few-Shot CLIP With Semantic-Aware Fine-Tuning.通过语义感知微调增强少样本CLIP
IEEE Trans Neural Netw Learn Syst. 2024 Aug 26;PP. doi: 10.1109/TNNLS.2024.3443394.
2
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
3
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis.利用扩散模型探索基于脑电图信号的图像生成潜力:结合混合方法和多模态分析的综合框架
JMIR Med Inform. 2025 Jun 25;13:e72027. doi: 10.2196/72027.
6
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
7
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
8
Atypical antipsychotics for disruptive behaviour disorders in children and youths.用于治疗儿童和青少年破坏性行为障碍的非典型抗精神病药物。
Cochrane Database Syst Rev. 2017 Aug 9;8(8):CD008559. doi: 10.1002/14651858.CD008559.pub3.
9
Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗
Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.
10
Conservative, physical and surgical interventions for managing faecal incontinence and constipation in adults with central neurological diseases.保守治疗、物理治疗和手术干预用于治疗伴有中枢神经系统疾病的成年人的粪便失禁和便秘。
Cochrane Database Syst Rev. 2024 Oct 29;10(10):CD002115. doi: 10.1002/14651858.CD002115.pub6.