• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

增强通用图像分割的查询公式制定

Enhancing Query Formulation for Universal Image Segmentation.

作者信息

Qu Yipeng, Kim Joohee

机构信息

Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.

出版信息

Sensors (Basel). 2024 Mar 14;24(6):1879. doi: 10.3390/s24061879.

DOI:10.3390/s24061879
PMID:38544142
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10974238/
Abstract

Recent advancements in image segmentation have been notably driven by Vision Transformers. These transformer-based models offer one versatile network structure capable of handling a variety of segmentation tasks. Despite their effectiveness, the pursuit of enhanced capabilities often leads to more intricate architectures and greater computational demands. OneFormer has responded to these challenges by introducing a query-text contrastive learning strategy active during training only. However, this approach has not completely addressed the inefficiency issues in text generation and the contrastive loss computation. To solve these problems, we introduce Efficient Query Optimizer (EQO), an approach that efficiently utilizes multi-modal data to refine query optimization in image segmentation. Our strategy significantly reduces the complexity of parameters and computations by distilling inter-class and inter-task information from an image into a single template sentence. Furthermore, we propose a novel attention-based contrastive loss. It is designed to facilitate a one-to-many matching mechanism in the loss computation, which helps object queries learn more robust representations. Beyond merely reducing complexity, our model demonstrates superior performance compared to OneFormer across all three segmentation tasks using the Swin-T backbone. Our evaluations on the ADE20K dataset reveal that our model outperforms OneFormer in multiple metrics: by 0.2% in mean Intersection over Union (mIoU), 0.6% in Average Precision (AP), and 0.8% in Panoptic Quality (PQ). These results highlight the efficacy of our model in advancing the field of image segmentation.

摘要

图像分割领域的最新进展显著地由视觉Transformer推动。这些基于Transformer的模型提供了一种通用的网络结构,能够处理各种分割任务。尽管它们很有效,但对增强功能的追求往往导致更复杂的架构和更高的计算需求。OneFormer通过引入仅在训练期间活跃的查询-文本对比学习策略来应对这些挑战。然而,这种方法尚未完全解决文本生成和对比损失计算中的低效率问题。为了解决这些问题,我们引入了高效查询优化器(EQO),这是一种有效利用多模态数据来优化图像分割中查询的方法。我们的策略通过将图像中的类间和任务间信息提炼成单个模板句子,显著降低了参数和计算的复杂性。此外,我们提出了一种基于注意力的新型对比损失。它旨在在损失计算中促进一对多匹配机制,这有助于对象查询学习更强大的表示。我们的模型不仅降低了复杂性,而且在使用Swin-T主干的所有三个分割任务中都比OneFormer表现更优。我们在ADE20K数据集上的评估表明,我们的模型在多个指标上优于OneFormer:平均交并比(mIoU)提高了0.2%,平均精度(AP)提高了0.6%,全景质量(PQ)提高了0.8%。这些结果凸显了我们的模型在推动图像分割领域发展方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/80b53b370629/sensors-24-01879-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/b14480c54763/sensors-24-01879-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/c48e25f0c822/sensors-24-01879-g0A2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/1ea8dd06c735/sensors-24-01879-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/843ab88e3d0e/sensors-24-01879-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/80b53b370629/sensors-24-01879-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/b14480c54763/sensors-24-01879-g0A1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/c48e25f0c822/sensors-24-01879-g0A2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/1ea8dd06c735/sensors-24-01879-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/843ab88e3d0e/sensors-24-01879-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/80b53b370629/sensors-24-01879-g003.jpg

相似文献

1
Enhancing Query Formulation for Universal Image Segmentation.增强通用图像分割的查询公式制定
Sensors (Basel). 2024 Mar 14;24(6):1879. doi: 10.3390/s24061879.
2
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
3
A novel adaptive cubic quasi-Newton optimizer for deep learning based medical image analysis tasks, validated on detection of COVID-19 and segmentation for COVID-19 lung infection, liver tumor, and optic disc/cup.一种用于深度学习的新型自适应三次拟牛顿优化器,在 COVID-19 检测和 COVID-19 肺部感染、肝脏肿瘤以及视盘/杯分割等医学图像分析任务中得到验证。
Med Phys. 2023 Mar;50(3):1528-1538. doi: 10.1002/mp.15969. Epub 2022 Oct 6.
4
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
5
DiagSWin: A multi-scale vision transformer with diagonal-shaped windows for object detection and segmentation.DiagSWin:一种具有对角线形状窗口的多尺度视觉转换器,用于目标检测和分割。
Neural Netw. 2024 Dec;180:106653. doi: 10.1016/j.neunet.2024.106653. Epub 2024 Aug 22.
6
Swin-MFA: A Multi-Modal Fusion Attention Network Based on Swin-Transformer for Low-Light Image Human Segmentation.Swin-MFA:一种基于 Swin-Transformer 的多模态融合注意力网络,用于低光照图像人体分割。
Sensors (Basel). 2022 Aug 19;22(16):6229. doi: 10.3390/s22166229.
7
CTDUNet: A Multimodal CNN-Transformer Dual U-Shaped Network with Coordinate Space Attention for Pests and Diseases Segmentation in Complex Environments.CTDUNet:一种用于复杂环境中病虫害分割的具有坐标空间注意力的多模态卷积神经网络-Transformer双U型网络
Plants (Basel). 2024 Aug 15;13(16):2274. doi: 10.3390/plants13162274.
8
CLIP-Driven Prototype Network for Few-Shot Semantic Segmentation.用于少样本语义分割的基于CLIP的原型网络
Entropy (Basel). 2023 Sep 18;25(9):1353. doi: 10.3390/e25091353.
9
Unsupervised Pre-Training for Detection Transformers.用于检测变压器的无监督预训练
IEEE Trans Pattern Anal Mach Intell. 2023 Nov;45(11):12772-12782. doi: 10.1109/TPAMI.2022.3216514. Epub 2023 Oct 3.
10
Boosting Semantic Segmentation by Conditioning the Backbone with Semantic Boundaries.通过语义边界调整主干网络来增强语义分割
Sensors (Basel). 2023 Aug 6;23(15):6980. doi: 10.3390/s23156980.

引用本文的文献

1
Efficient Multi-Task Training with Adaptive Feature Alignment for Universal Image Segmentation.基于自适应特征对齐的高效多任务训练用于通用图像分割
Sensors (Basel). 2025 Jan 9;25(2):359. doi: 10.3390/s25020359.
2
R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut.R-Cut:通过关系加权输出和裁剪增强视觉Transformer的可解释性
Sensors (Basel). 2024 Apr 24;24(9):2695. doi: 10.3390/s24092695.

本文引用的文献

1
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising.DN-DETR:通过引入查询去噪加速DETR训练。
IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2239-2251. doi: 10.1109/TPAMI.2023.3335410. Epub 2024 Mar 6.
2
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer.迈向稳健的单目深度估计:混合数据集以实现零样本跨数据集迁移。
IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1623-1637. doi: 10.1109/TPAMI.2020.3019967. Epub 2022 Feb 3.
3
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.
DeepLab:基于深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。
IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.
4
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.SegNet:一种用于图像分割的深度卷积编解码器架构。
IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615. Epub 2017 Jan 2.