• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于少样本语义分割的基于CLIP的原型网络

CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation.

作者信息

Guo Shi-Cheng, Liu Shang-Kun, Wang Jing-Yu, Zheng Wei-Min, Jiang Cheng-Yu

机构信息

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China.

出版信息

Entropy (Basel). 2023 Sep 18;25(9):1353. doi: 10.3390/e25091353.

DOI:10.3390/e25091353
PMID:37761652
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10529322/
Abstract

Recent research has shown that visual-text pretrained models perform well in traditional vision tasks. CLIP, as the most influential work, has garnered significant attention from researchers. Thanks to its excellent visual representation capabilities, many recent studies have used CLIP for pixel-level tasks. We explore the potential abilities of CLIP in the field of few-shot segmentation. The current mainstream approach is to utilize support and query features to generate class prototypes and then use the prototype features to match image features. We propose a new method that utilizes CLIP to extract text features for a specific class. These text features are then used as training samples to participate in the model's training process. The addition of text features enables model to extract features that contain richer semantic information, thus making it easier to capture potential class information. To better match the query image features, we also propose a new prototype generation method that incorporates multi-modal fusion features of text and images in the prototype generation process. Adaptive query prototypes were generated by combining foreground and background information from the images with the multi-modal support prototype, thereby allowing for a better matching of image features and improved segmentation accuracy. We provide a new perspective to the task of few-shot segmentation in multi-modal scenarios. Experiments demonstrate that our proposed method achieves excellent results on two common datasets, PASCAL-5i and COCO-20i.

摘要

最近的研究表明,视觉文本预训练模型在传统视觉任务中表现良好。CLIP作为最具影响力的工作,受到了研究人员的广泛关注。由于其出色的视觉表征能力,最近许多研究将CLIP用于像素级任务。我们探索CLIP在少样本分割领域的潜在能力。当前的主流方法是利用支持特征和查询特征来生成类原型,然后使用原型特征来匹配图像特征。我们提出了一种新方法,利用CLIP为特定类提取文本特征。然后将这些文本特征用作训练样本参与模型的训练过程。文本特征的加入使模型能够提取包含更丰富语义信息的特征,从而更容易捕捉潜在的类信息。为了更好地匹配查询图像特征,我们还提出了一种新的原型生成方法,该方法在原型生成过程中结合了文本和图像的多模态融合特征。通过将图像中的前景和背景信息与多模态支持原型相结合,生成自适应查询原型,从而实现图像特征的更好匹配并提高分割精度。我们为多模态场景下的少样本分割任务提供了一个新的视角。实验表明,我们提出的方法在两个常用数据集PASCAL-5i和COCO-20i上取得了优异的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/0ca21d26fcd2/entropy-25-01353-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/314865ddae86/entropy-25-01353-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/61eff1d5c07f/entropy-25-01353-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/173e0b3270c0/entropy-25-01353-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/a7da2c85f41f/entropy-25-01353-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/814f7336ec3b/entropy-25-01353-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/0ca21d26fcd2/entropy-25-01353-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/314865ddae86/entropy-25-01353-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/61eff1d5c07f/entropy-25-01353-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/173e0b3270c0/entropy-25-01353-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/a7da2c85f41f/entropy-25-01353-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/814f7336ec3b/entropy-25-01353-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3908/10529322/0ca21d26fcd2/entropy-25-01353-g006.jpg

相似文献

1
CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation.用于少样本语义分割的基于CLIP的原型网络
Entropy (Basel). 2023 Sep 18;25(9):1353. doi: 10.3390/e25091353.
2
Few-shot segmentation with duplex network and attention augmented module.基于双工网络和注意力增强模块的少样本分割
Front Neurorobot. 2023 Jun 21;17:1206189. doi: 10.3389/fnbot.2023.1206189. eCollection 2023.
3
Dual Branch Multi-Level Semantic Learning for Few-Shot Segmentation.用于少样本分割的双分支多级语义学习
IEEE Trans Image Process. 2024;33:1432-1447. doi: 10.1109/TIP.2024.3364056. Epub 2024 Feb 21.
4
Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation.用于三模态图像少样本语义分割的自增强混合注意力网络
Sensors (Basel). 2023 Jul 22;23(14):6612. doi: 10.3390/s23146612.
5
A Self-Supervised Few-Shot Semantic Segmentation Method Based on Multi-Task Learning and Dense Attention Computation.一种基于多任务学习和密集注意力计算的自监督少样本语义分割方法。
Sensors (Basel). 2024 Jul 31;24(15):4975. doi: 10.3390/s24154975.
6
Prototype Adaption and Projection for Few- and Zero-Shot 3D Point Cloud Semantic Segmentation.用于少样本和零样本3D点云语义分割的原型适配与投影
IEEE Trans Image Process. 2023;32:3199-3211. doi: 10.1109/TIP.2023.3279660. Epub 2023 Jun 7.
7
Transductive meta-learning with enhanced feature ensemble for few-shot semantic segmentation.基于增强特征集成的转导元学习用于少样本语义分割。
Sci Rep. 2024 Feb 18;14(1):4028. doi: 10.1038/s41598-024-54640-6.
8
MCEENet: Multi-Scale Context Enhancement and Edge-Assisted Network for Few-Shot Semantic Segmentation.MCEENet:用于Few-Shot 语义分割的多尺度上下文增强和边缘辅助网络。
Sensors (Basel). 2023 Mar 8;23(6):2922. doi: 10.3390/s23062922.
9
Prototype-Guided Graph Reasoning Network for Few-Shot Medical Image Segmentation.用于少样本医学图像分割的原型引导图推理网络
IEEE Trans Med Imaging. 2025 Feb;44(2):761-773. doi: 10.1109/TMI.2024.3459943. Epub 2025 Feb 4.
10
DRNet: Double Recalibration Network for Few-Shot Semantic Segmentation.DRNet:用于少样本语义分割的双重新校准网络。
IEEE Trans Image Process. 2022;31:6733-6746. doi: 10.1109/TIP.2022.3215905. Epub 2022 Oct 28.

本文引用的文献

1
Prior Guided Feature Enrichment Network for Few-Shot Segmentation.用于少样本分割的先验引导特征增强网络。
IEEE Trans Pattern Anal Mach Intell. 2022 Feb;44(2):1050-1065. doi: 10.1109/TPAMI.2020.3013717. Epub 2022 Jan 7.
2
SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation.SG-One:用于一次性语义分割的相似性引导网络。
IEEE Trans Cybern. 2020 Sep;50(9):3855-3865. doi: 10.1109/TCYB.2020.2992433. Epub 2020 Jun 4.
3
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.
DeepLab:基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。
IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.
4
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.SegNet:一种用于图像分割的深度卷积编解码器架构。
IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615. Epub 2017 Jan 2.