• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SAN:用于开放词汇语义分割的侧边适配器网络。

SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation.

作者信息

Xu Mengde, Zhang Zheng, Wei Fangyun, Hu Han, Bai Xiang

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15546-15561. doi: 10.1109/TPAMI.2023.3311618. Epub 2023 Nov 3.

DOI:10.1109/TPAMI.2023.3311618
PMID:37665708
Abstract

This article concentrates on open-vocabulary semantic segmentation, where a well optimized model is able to segment arbitrary categories that appear in an image. To achieve this goal, we present a novel framework termed Side Adapter Network, or SAN for short. Our design principles are three-fold: 1) Recent large-scale vision-language models (e.g. CLIP) show promising open-vocabulary image classification capability; it is training-economized to adapt a pre-trained CLIP model to open-vocabulary semantic segmentation. 2) Our SAN model should be both lightweight and effective in order to reduce the inference cost-to achieve this, we fuse the CLIP model's intermediate features to enhance the representation capability of the SAN model, and drive the CLIP model to focus on the informative areas of an image with the aid of the attention biases predicted by a side adapter network. 3) Our approach should empower mainstream segmentation architectures to have the capability of open-vocabulary segmentation-we present P-SAN and R-SAN, to support widely adopted pixel-wise segmentation and region-wise segmentation, respectively. Experimentally, our approach achieves state-of-the-art performance on 5 commonly used benchmarks while having much less trainable parameters and GFLOPs. For instance, our R-SAN outperforms previous best method OvSeg by +2.3 averaged mIoU across all benchmarks while using only 6% of trainable parameters and less than 1% of GFLOPs. In addition, we also conduct a comprehensive analysis of the open-vocabulary semantic segmentation datasets and verify the feasibility of transferring a well optimzied R-SAN model to video segmentation task.

摘要

本文专注于开放词汇语义分割,即一个经过充分优化的模型能够对图像中出现的任意类别进行分割。为实现这一目标,我们提出了一种名为边适配器网络(Side Adapter Network,简称SAN)的新颖框架。我们的设计原则有三点:1)近期的大规模视觉语言模型(如CLIP)展现出了颇具前景的开放词汇图像分类能力;将预训练的CLIP模型应用于开放词汇语义分割可节省训练成本。2)我们的SAN模型应兼具轻量级和高效性,以降低推理成本——为实现这一点,我们融合CLIP模型的中间特征以增强SAN模型的表征能力,并借助边适配器网络预测的注意力偏差促使CLIP模型聚焦于图像中的信息区域。3)我们的方法应使主流分割架构具备开放词汇分割能力——我们提出了P-SAN和R-SAN,分别支持广泛采用的逐像素分割和逐区域分割。通过实验,我们的方法在5个常用基准测试中取得了领先的性能,同时可训练参数和GFLOP数要少得多。例如,我们的R-SAN在所有基准测试中平均mIoU比之前最佳方法OvSeg高出2.3,而其可训练参数仅为OvSeg的6%,GFLOP数不到1%。此外,我们还对开放词汇语义分割数据集进行了全面分析,并验证了将经过充分优化的R-SAN模型迁移至视频分割任务的可行性。

相似文献

1
SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation.SAN:用于开放词汇语义分割的侧边适配器网络。
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15546-15561. doi: 10.1109/TPAMI.2023.3311618. Epub 2023 Nov 3.
2
CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation.用于少样本语义分割的基于CLIP的原型网络
Entropy (Basel). 2023 Sep 18;25(9):1353. doi: 10.3390/e25091353.
3
DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation.DTS-Net:用于快速准确语义对象分割的深度到空间网络
Sensors (Basel). 2022 Jan 3;22(1):337. doi: 10.3390/s22010337.
4
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding.Lowis3D:语言驱动的开放世界实例级3D场景理解
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8517-8533. doi: 10.1109/TPAMI.2024.3410324. Epub 2024 Nov 6.
5
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future.关于开放词汇检测与分割的综述:过去、现在与未来
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8954-8975. doi: 10.1109/TPAMI.2024.3413013. Epub 2024 Nov 6.
6
Coarse-to-Fine Semantic Segmentation From Image-Level Labels.从图像级标签进行粗到细的语义分割。
IEEE Trans Image Process. 2020;29:225-236. doi: 10.1109/TIP.2019.2926748. Epub 2019 Jul 12.
7
Semantic segmentation of autonomous driving scenes based on multi-scale adaptive attention mechanism.基于多尺度自适应注意力机制的自动驾驶场景语义分割
Front Neurosci. 2023 Oct 19;17:1291674. doi: 10.3389/fnins.2023.1291674. eCollection 2023.
8
SurgiNet: Pyramid Attention Aggregation and Class-wise Self-Distillation for Surgical Instrument Segmentation.SurgiNet:用于手术器械分割的金字塔注意力聚合和类别内自蒸馏。
Med Image Anal. 2022 Feb;76:102310. doi: 10.1016/j.media.2021.102310. Epub 2021 Dec 4.
9
W-Net: Dense and diagnostic semantic segmentation of subcutaneous and breast tissue in ultrasound images by incorporating ultrasound RF waveform data.W-Net:通过整合超声射频(RF)波形数据,实现超声图像中皮下组织和乳腺组织的密集且具有诊断意义的语义分割。
Med Image Anal. 2022 Feb;76:102326. doi: 10.1016/j.media.2021.102326. Epub 2021 Dec 5.
10
Rethinking 1D convolution for lightweight semantic segmentation.重新思考用于轻量级语义分割的一维卷积
Front Neurorobot. 2023 Feb 9;17:1119231. doi: 10.3389/fnbot.2023.1119231. eCollection 2023.

引用本文的文献

1
ItpCtrl-AI: End-to-end interpretable and controllable artificial intelligence by modeling radiologists' intentions.ItpCtrl-AI:通过对放射科医生的意图进行建模实现端到端可解释和可控的人工智能。
Artif Intell Med. 2025 Feb;160:103054. doi: 10.1016/j.artmed.2024.103054. Epub 2024 Dec 12.
2
LT-DeepLab: an improved DeepLabV3+ cross-scale segmentation algorithm for Zanthoxylum bungeanum Maxim leaf-trunk diseases in real-world environments.LT-DeepLab:一种改进的DeepLabV3+跨尺度分割算法,用于实际环境中的花椒叶干病害
Front Plant Sci. 2024 Oct 22;15:1423238. doi: 10.3389/fpls.2024.1423238. eCollection 2024.
3
Automatic segmentation of 15 critical anatomical labels and measurements of cardiac axis and cardiothoracic ratio in fetal four chambers using nnU-NetV2.
使用 nnU-NetV2 自动分割胎儿四腔心 15 个关键解剖标签和心轴及心胸比测量值。
BMC Med Inform Decis Mak. 2024 May 21;24(1):128. doi: 10.1186/s12911-024-02527-x.