Yang Lanshi, Wang Shiguo, Teng Shuhua
School of Computer Science and Technology, Changsha University of Science and Technology, Changsha 410076, China.
School of Electronic Information, Hunan First Normal University, Changsha 410205, China.
Sensors (Basel). 2025 May 5;25(9):2919. doi: 10.3390/s25092919.
Panoptic segmentation, as a key task in the field of computer vision, holds significant importance in practical applications such as autonomous driving and robot vision. Currently, among deep-learning-based panoptic segmentation methods, query-based methods have received widespread attention. However, existing methods, such as Mask2Former, typically rely on a static query mechanism. This makes it difficult for the model to adapt to changes in the number of instances in different scenes and can lead to instance loss or confusion, thus limiting performance in complex scenes. Furthermore, it is prone to insufficient feature extraction and a loss of global information. To address these problems, this paper proposes a panoptic segmentation method based on dynamic instance queries (PSM-DIQ). PSM-DIQ uses a multi-dimensional attention mechanism to enhance feature extraction, utilizes instance-activation-guided dynamic query generation to improve the ability to distinguish between different instances, and optimizes pixel-query interactions through a dual-path Transformer decoder. Experiments on the Cityscapes and MS COCO datasets show that, based on the ResNet-50 backbone, PSM-DIQ significantly outperforms the Mask2Former baseline, with PQ values improving by 1.8 and 1.7 percentage points, respectively. The experimental results verify the effectiveness of PSM-DIQ in complex scene panoptic segmentation. Finally, this work will be released as an open-source software package on GitHub (v1.0).
全景分割作为计算机视觉领域的一项关键任务,在自动驾驶和机器人视觉等实际应用中具有重要意义。目前,在基于深度学习的全景分割方法中,基于查询的方法受到了广泛关注。然而,现有的方法,如Mask2Former,通常依赖于静态查询机制。这使得模型难以适应不同场景中实例数量的变化,并可能导致实例丢失或混淆,从而限制了在复杂场景中的性能。此外,它还容易出现特征提取不足和全局信息丢失的问题。为了解决这些问题,本文提出了一种基于动态实例查询的全景分割方法(PSM-DIQ)。PSM-DIQ使用多维注意力机制来增强特征提取,利用实例激活引导的动态查询生成来提高区分不同实例的能力,并通过双路径Transformer解码器优化像素查询交互。在Cityscapes和MS COCO数据集上的实验表明,基于ResNet-50主干,PSM-DIQ显著优于Mask2Former基线,PQ值分别提高了1.8和1.7个百分点。实验结果验证了PSM-DIQ在复杂场景全景分割中的有效性。最后,这项工作将作为一个开源软件包在GitHub上发布(v1.0)。