Suppr超能文献

增强通用图像分割的查询公式制定

Enhancing Query Formulation for Universal Image Segmentation.

作者信息

Qu Yipeng, Kim Joohee

机构信息

Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL 60616, USA.

出版信息

Sensors (Basel). 2024 Mar 14;24(6):1879. doi: 10.3390/s24061879.

Abstract

Recent advancements in image segmentation have been notably driven by Vision Transformers. These transformer-based models offer one versatile network structure capable of handling a variety of segmentation tasks. Despite their effectiveness, the pursuit of enhanced capabilities often leads to more intricate architectures and greater computational demands. OneFormer has responded to these challenges by introducing a query-text contrastive learning strategy active during training only. However, this approach has not completely addressed the inefficiency issues in text generation and the contrastive loss computation. To solve these problems, we introduce Efficient Query Optimizer (EQO), an approach that efficiently utilizes multi-modal data to refine query optimization in image segmentation. Our strategy significantly reduces the complexity of parameters and computations by distilling inter-class and inter-task information from an image into a single template sentence. Furthermore, we propose a novel attention-based contrastive loss. It is designed to facilitate a one-to-many matching mechanism in the loss computation, which helps object queries learn more robust representations. Beyond merely reducing complexity, our model demonstrates superior performance compared to OneFormer across all three segmentation tasks using the Swin-T backbone. Our evaluations on the ADE20K dataset reveal that our model outperforms OneFormer in multiple metrics: by 0.2% in mean Intersection over Union (mIoU), 0.6% in Average Precision (AP), and 0.8% in Panoptic Quality (PQ). These results highlight the efficacy of our model in advancing the field of image segmentation.

摘要

图像分割领域的最新进展显著地由视觉Transformer推动。这些基于Transformer的模型提供了一种通用的网络结构,能够处理各种分割任务。尽管它们很有效,但对增强功能的追求往往导致更复杂的架构和更高的计算需求。OneFormer通过引入仅在训练期间活跃的查询-文本对比学习策略来应对这些挑战。然而,这种方法尚未完全解决文本生成和对比损失计算中的低效率问题。为了解决这些问题,我们引入了高效查询优化器(EQO),这是一种有效利用多模态数据来优化图像分割中查询的方法。我们的策略通过将图像中的类间和任务间信息提炼成单个模板句子,显著降低了参数和计算的复杂性。此外,我们提出了一种基于注意力的新型对比损失。它旨在在损失计算中促进一对多匹配机制,这有助于对象查询学习更强大的表示。我们的模型不仅降低了复杂性,而且在使用Swin-T主干的所有三个分割任务中都比OneFormer表现更优。我们在ADE20K数据集上的评估表明,我们的模型在多个指标上优于OneFormer:平均交并比(mIoU)提高了0.2%,平均精度(AP)提高了0.6%,全景质量(PQ)提高了0.8%。这些结果凸显了我们的模型在推动图像分割领域发展方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f38/10974238/b14480c54763/sensors-24-01879-g0A1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验