Suppr超能文献

基于动态实例查询的全景图像分割方法

Panoptic Image Segmentation Method Based on Dynamic Instance Query.

作者信息

Yang Lanshi, Wang Shiguo, Teng Shuhua

机构信息

School of Computer Science and Technology, Changsha University of Science and Technology, Changsha 410076, China.

School of Electronic Information, Hunan First Normal University, Changsha 410205, China.

出版信息

Sensors (Basel). 2025 May 5;25(9):2919. doi: 10.3390/s25092919.

Abstract

Panoptic segmentation, as a key task in the field of computer vision, holds significant importance in practical applications such as autonomous driving and robot vision. Currently, among deep-learning-based panoptic segmentation methods, query-based methods have received widespread attention. However, existing methods, such as Mask2Former, typically rely on a static query mechanism. This makes it difficult for the model to adapt to changes in the number of instances in different scenes and can lead to instance loss or confusion, thus limiting performance in complex scenes. Furthermore, it is prone to insufficient feature extraction and a loss of global information. To address these problems, this paper proposes a panoptic segmentation method based on dynamic instance queries (PSM-DIQ). PSM-DIQ uses a multi-dimensional attention mechanism to enhance feature extraction, utilizes instance-activation-guided dynamic query generation to improve the ability to distinguish between different instances, and optimizes pixel-query interactions through a dual-path Transformer decoder. Experiments on the Cityscapes and MS COCO datasets show that, based on the ResNet-50 backbone, PSM-DIQ significantly outperforms the Mask2Former baseline, with PQ values improving by 1.8 and 1.7 percentage points, respectively. The experimental results verify the effectiveness of PSM-DIQ in complex scene panoptic segmentation. Finally, this work will be released as an open-source software package on GitHub (v1.0).

摘要

全景分割作为计算机视觉领域的一项关键任务,在自动驾驶和机器人视觉等实际应用中具有重要意义。目前,在基于深度学习的全景分割方法中,基于查询的方法受到了广泛关注。然而,现有的方法,如Mask2Former,通常依赖于静态查询机制。这使得模型难以适应不同场景中实例数量的变化,并可能导致实例丢失或混淆,从而限制了在复杂场景中的性能。此外,它还容易出现特征提取不足和全局信息丢失的问题。为了解决这些问题,本文提出了一种基于动态实例查询的全景分割方法(PSM-DIQ)。PSM-DIQ使用多维注意力机制来增强特征提取,利用实例激活引导的动态查询生成来提高区分不同实例的能力,并通过双路径Transformer解码器优化像素查询交互。在Cityscapes和MS COCO数据集上的实验表明,基于ResNet-50主干,PSM-DIQ显著优于Mask2Former基线,PQ值分别提高了1.8和1.7个百分点。实验结果验证了PSM-DIQ在复杂场景全景分割中的有效性。最后,这项工作将作为一个开源软件包在GitHub上发布(v1.0)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9938/12074491/06b8473fbf70/sensors-25-02919-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验