Xian Peng-Fei, Po Lai-Man, Xiong Jing-Jing, Zhao Yu-Zhi, Yu Wing-Yin, Cheung Kwok-Wai
Department of Electronic Engineering, City University of Hong Kong, Hong Kong.
School of Communication, The Hang Seng University of Hong Kong, Hong Kong.
Sensors (Basel). 2024 Feb 22;24(5):1411. doi: 10.3390/s24051411.
In this paper, we introduce a novel panoptic segmentation method called the Mask-Pyramid Network. Existing Mask RCNN-based methods first generate a large number of box proposals and then filter them at each feature level, which requires a lot of computational resources, while most of the box proposals are suppressed and discarded in the Non-Maximum Suppression process. Additionally, for panoptic segmentation, it is a problem to properly fuse the semantic segmentation results with the Mask RCNN-produced instance segmentation results. To address these issues, we propose a new mask pyramid mechanism to distinguish objects and generate much fewer proposals by referring to existing segmented masks, so as to reduce computing resource consumption. The Mask-Pyramid Network generates object proposals and predicts masks from larger to smaller sizes. It records the pixel area occupied by the larger object masks, and then only generates proposals on the unoccupied areas. Each object mask is represented as a H × W × 1 logit, which fits well in format with the semantic segmentation logits. By applying SoftMax to the concatenated semantic and instance segmentation logits, it is easy and natural to fuse both segmentation results. We empirically demonstrate that the proposed Mask-Pyramid Network achieves comparable accuracy performance on the Cityscapes and COCO datasets. Furthermore, we demonstrate the computational efficiency of the proposed method and obtain competitive results.
在本文中,我们介绍了一种名为掩码金字塔网络(Mask-Pyramid Network)的新型全景分割方法。现有的基于Mask RCNN的方法首先生成大量的边界框提议,然后在每个特征级别对其进行过滤,这需要大量的计算资源,而在非极大值抑制过程中,大多数边界框提议被抑制和丢弃。此外,对于全景分割而言,如何将语义分割结果与Mask RCNN生成的实例分割结果进行恰当融合是一个问题。为了解决这些问题,我们提出了一种新的掩码金字塔机制,通过参考现有的分割掩码来区分对象并生成数量少得多的提议,从而减少计算资源消耗。掩码金字塔网络从大到小生成对象提议并预测掩码。它记录较大对象掩码占据的像素区域,然后仅在未占据区域生成提议。每个对象掩码表示为一个H×W×1的logit,其格式与语义分割logit非常匹配。通过对拼接后的语义和实例分割logit应用SoftMax,很容易且自然地融合这两种分割结果。我们通过实验证明,所提出的掩码金字塔网络在Cityscapes和COCO数据集上实现了可比的精度性能。此外,我们展示了所提方法的计算效率并获得了具有竞争力的结果。