• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于少样本学习的掩码引导视觉变换器

Mask-Guided Vision Transformer for Few-Shot Learning.

作者信息

Chen Yuzhong, Xiao Zhenxiang, Pan Yi, Zhao Lin, Dai Haixing, Wu Zihao, Li Changhe, Zhang Tuo, Li Changying, Zhu Dajiang, Liu Tianming, Jiang Xi

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9636-9647. doi: 10.1109/TNNLS.2024.3418527. Epub 2025 May 2.

DOI:10.1109/TNNLS.2024.3418527
PMID:38976473
Abstract

Learning with little data is challenging but often inevitable in various application scenarios where the labeled data are limited and costly. Recently, few-shot learning (FSL) gained increasing attention because of its generalizability of prior knowledge to new tasks that contain only a few samples. However, for data-intensive models such as vision transformer (ViT), current fine-tuning-based FSL approaches are inefficient in knowledge generalization and, thus, degenerate the downstream task performances. In this article, we propose a novel mask-guided ViT (MG-ViT) to achieve an effective and efficient FSL on the ViT model. The key idea is to apply a mask on image patches to screen out the task-irrelevant ones and to guide the ViT focusing on task-relevant and discriminative patches during FSL. Particularly, MG-ViT only introduces an additional mask operation and a residual connection, enabling the inheritance of parameters from pretrained ViT without any other cost. To optimally select representative few-shot samples, we also include an active learning-based sample selection method to further improve the generalizability of MG-ViT-based FSL. We evaluate the proposed MG-ViT on classification, object detection, and segmentation tasks using gradient-weighted class activation mapping (Grad-CAM) to generate masks. The experimental results show that the MG-ViT model significantly improves the performance and efficiency compared with general fine-tuning-based ViT and ResNet models, providing novel insights and a concrete approach toward generalizing data-intensive and large-scale deep learning models for FSL.

摘要

在各种标注数据有限且成本高昂的应用场景中,小数据学习颇具挑战性但又往往不可避免。近来,少样本学习(FSL)因其能将先验知识推广到仅包含少量样本的新任务上而受到越来越多的关注。然而,对于诸如视觉Transformer(ViT)这样的数据密集型模型,当前基于微调的FSL方法在知识泛化方面效率低下,进而导致下游任务性能退化。在本文中,我们提出了一种新颖的掩码引导ViT(MG-ViT),以在ViT模型上实现高效的FSL。关键思想是在图像块上应用一个掩码,以筛选出与任务无关的块,并在少样本学习期间引导ViT专注于与任务相关且具有区分性的块。特别地,MG-ViT仅引入了一个额外的掩码操作和一个残差连接,能够在不产生任何其他成本的情况下继承预训练ViT的参数。为了最优地选择具有代表性的少样本,我们还纳入了一种基于主动学习的样本选择方法,以进一步提高基于MG-ViT的FSL的泛化能力。我们使用梯度加权类激活映射(Grad-CAM)来生成掩码,在分类、目标检测和分割任务上评估所提出的MG-ViT。实验结果表明,与基于普通微调的ViT和ResNet模型相比,MG-ViT模型显著提高了性能和效率,为少样本学习推广数据密集型和大规模深度学习模型提供了新颖的见解和具体方法。

相似文献

1
Mask-Guided Vision Transformer for Few-Shot Learning.用于少样本学习的掩码引导视觉变换器
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9636-9647. doi: 10.1109/TNNLS.2024.3418527. Epub 2025 May 2.
2
Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.使用暹罗神经网络的临床自然语言处理少样本学习:算法开发与验证研究
JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.
3
Eye-Gaze-Guided Vision Transformer for Rectifying Shortcut Learning.眼动引导视觉Transformer 用于纠正捷径学习。
IEEE Trans Med Imaging. 2023 Nov;42(11):3384-3394. doi: 10.1109/TMI.2023.3287572. Epub 2023 Oct 27.
4
Self-supervised learning improves robustness of deep learning lung tumor segmentation models to CT imaging differences.自监督学习提高了深度学习肺肿瘤分割模型对CT成像差异的鲁棒性。
Med Phys. 2025 Mar;52(3):1573-1588. doi: 10.1002/mp.17541. Epub 2024 Dec 5.
5
A conditional GAN-based approach for enhancing transfer learning performance in few-shot HCR tasks.基于条件生成对抗网络的少样本 HCR 任务中迁移学习性能增强方法。
Sci Rep. 2022 Sep 29;12(1):16271. doi: 10.1038/s41598-022-20654-1.
6
Rectify ViT Shortcut Learning by Visual Saliency.通过视觉显著性纠正视觉Transformer的捷径学习
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18013-18025. doi: 10.1109/TNNLS.2023.3310531. Epub 2024 Dec 2.
7
From laboratory to field: cross-domain few-shot learning for crop disease identification in the field.从实验室到田间:用于田间作物病害识别的跨域少样本学习
Front Plant Sci. 2024 Dec 18;15:1434222. doi: 10.3389/fpls.2024.1434222. eCollection 2024.
8
Automated classification of oral cancer lesions: Vision transformers vs radiomics.口腔癌病变的自动分类:视觉Transformer与放射组学
Comput Biol Med. 2025 May;189:109913. doi: 10.1016/j.compbiomed.2025.109913. Epub 2025 Feb 27.
9
Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-Based Noninvasive Digital System.基于皮肤镜的无创数字系统中使用视觉Transformer进行自动分析的皮肤癌分割与分类
Int J Biomed Imaging. 2024 Feb 3;2024:3022192. doi: 10.1155/2024/3022192. eCollection 2024.
10
Multi-Learner Based Deep Meta-Learning for Few-Shot Medical Image Classification.基于多学习者的深度元学习用于少样本医学图像分类
IEEE J Biomed Health Inform. 2023 Jan;27(1):17-28. doi: 10.1109/JBHI.2022.3215147. Epub 2023 Jan 5.

引用本文的文献

1
A swin transformer and CNN fusion framework for accurate Parkinson disease classification in MRI.一种用于磁共振成像中帕金森病准确分类的基于卷积神经网络的视觉变换器融合框架
Sci Rep. 2025 Apr 29;15(1):15117. doi: 10.1038/s41598-025-93671-5.