• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TagCLIP:提高零样本语义分割的辨别能力

TagCLIP: Improving Discrimination Ability of Zero-Shot Semantic Segmentation.

作者信息

Li Jingyao, Chen Pengguang, Qian Shengju, Liu Shu, Jia Jiaya

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11287-11297. doi: 10.1109/TPAMI.2024.3454647. Epub 2024 Nov 6.

DOI:10.1109/TPAMI.2024.3454647
PMID:39231046
Abstract

Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks. However, existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes, leading to confusion between novel classes and semantically similar ones. In this work, we propose a novel approach, TagCLIP (Trusty-aware guided CLIP), to address this issue. We disentangle the ill-posed optimization problem into two parallel processes: semantic matching performed individually and reliability judgment for improving discrimination ability. Building on the idea of special tokens in language modeling representing sentence-level embeddings, we introduce a trusty token that enables distinguishing novel classes from known ones in prediction. To evaluate our approach, we conduct experiments on two benchmark datasets, PASCAL VOC 2012 and COCO-Stuff 164 K. Our results show that TagCLIP improves the Intersection over Union (IoU) of unseen classes by 7.4% and 1.7%, respectively, with negligible overheads. The code is available at here.

摘要

对比语言-图像预训练(CLIP)最近在像素级零样本学习任务中展现出了巨大潜力。然而,现有的利用CLIP的文本和补丁嵌入来生成语义掩码的方法,常常会误识别来自未见类别的输入像素,导致新类别与语义相似类别之间产生混淆。在这项工作中,我们提出了一种新颖的方法TagCLIP(可信感知引导的CLIP)来解决这个问题。我们将不适定的优化问题分解为两个并行过程:分别进行语义匹配以及进行可靠性判断以提高辨别能力。基于语言建模中表示句子级嵌入的特殊令牌的思想,我们引入了一个可信令牌,它能够在预测中区分新类别和已知类别。为了评估我们的方法,我们在两个基准数据集PASCAL VOC 2012和COCO-Stuff 164 K上进行了实验。我们的结果表明,TagCLIP分别将未见类别的交并比(IoU)提高了7.4%和1.7%,且开销可忽略不计。代码可在此处获取。

相似文献

1
TagCLIP: Improving Discrimination Ability of Zero-Shot Semantic Segmentation.TagCLIP:提高零样本语义分割的辨别能力
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):11287-11297. doi: 10.1109/TPAMI.2024.3454647. Epub 2024 Nov 6.
2
From Pixel to Patch: Synthesize Context-Aware Features for Zero-Shot Semantic Segmentation.从像素到图像块:为零样本语义分割合成上下文感知特征。
IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7689-7703. doi: 10.1109/TNNLS.2022.3145962. Epub 2023 Oct 5.
3
Cross-Image Pixel Contrasting for Semantic Segmentation.用于语义分割的跨图像像素对比
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5398-5412. doi: 10.1109/TPAMI.2024.3367952. Epub 2024 Jul 2.
4
Transformer-Based Approach Via Contrastive Learning for Zero-Shot Detection.基于对比学习的零样本检测的Transformer 方法。
Int J Neural Syst. 2023 Jul;33(7):2350035. doi: 10.1142/S0129065723500351. Epub 2023 Jun 14.
5
CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation.用于少样本语义分割的基于CLIP的原型网络
Entropy (Basel). 2023 Sep 18;25(9):1353. doi: 10.3390/e25091353.
6
Polarity Loss: Improving Visual-Semantic Alignment for Zero-Shot Detection.极性损失:改进用于零样本检测的视觉语义对齐
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4066-4078. doi: 10.1109/TNNLS.2022.3184821. Epub 2025 Feb 28.
7
Semantics-Guided Contrastive Network for Zero-Shot Object Detection.用于零样本目标检测的语义引导对比网络
IEEE Trans Pattern Anal Mach Intell. 2024 Mar;46(3):1530-1544. doi: 10.1109/TPAMI.2021.3140070. Epub 2024 Feb 6.
8
Dual Branch Multi-Level Semantic Learning for Few-Shot Segmentation.用于少样本分割的双分支多级语义学习
IEEE Trans Image Process. 2024;33:1432-1447. doi: 10.1109/TIP.2024.3364056. Epub 2024 Feb 21.
9
Enhancing Few-Shot CLIP With Semantic-Aware Fine-Tuning.通过语义感知微调增强少样本CLIP
IEEE Trans Neural Netw Learn Syst. 2024 Aug 26;PP. doi: 10.1109/TNNLS.2024.3443394.
10
MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation.MCTformer+:用于弱监督语义分割的多类令牌变换器
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8380-8395. doi: 10.1109/TPAMI.2024.3404422. Epub 2024 Nov 6.