Suppr
超能文献

快速全景分割的软注意嵌入。

Fast Panoptic Segmentation with Soft Attention Embeddings.

机构信息

Computer Science Department, Technical University of Cluj-Napoca, Memorandumului 28, 400114 Cluj-Napoca, Romania.

出版信息

Sensors (Basel). 2022 Jan 20;22(3):783. doi: 10.3390/s22030783.

DOI:10.3390/s22030783

PMID:35161529

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8837929/

Abstract

Panoptic segmentation provides a rich 2D environment representation by unifying semantic and instance segmentation. Most current state-of-the-art panoptic segmentation methods are built upon two-stage detectors and are not suitable for real-time applications, such as automated driving, due to their high computational complexity. In this work, we introduce a novel, fast and accurate single-stage panoptic segmentation network that employs a shared feature extraction backbone and three network heads for object detection, semantic segmentation, instance-level attention masks. Guided by object detections, our new panoptic segmentation head learns instance specific soft attention masks based on spatial embeddings. The semantic masks for stuff classes and soft instance masks for things classes are pixel-wise coherent and can be easily integrated in a panoptic output. The training and inference pipelines are simplified and no post-processing of the panoptic output is necessary. Benefiting from fast inference speed, the network can be deployed in automated vehicles or robotic applications. We perform extensive experiments on COCO and Cityscapes datasets and obtain competitive results in both accuracy and time. On the Cityscapes dataset we achieve 59.7 panoptic quality with an inference speed of more than 10 FPS on high resolution 1024 × 2048 images.

摘要

全景分割通过统一语义分割和实例分割，提供了丰富的 2D 环境表示。大多数当前最先进的全景分割方法都是基于两阶段检测器构建的，由于其计算复杂度高，不适合实时应用，例如自动驾驶。在这项工作中，我们引入了一种新颖的、快速而准确的单阶段全景分割网络，该网络采用共享特征提取主干和三个网络头，用于目标检测、语义分割、实例级注意力掩模。受目标检测的引导，我们的新全景分割头基于空间嵌入学习基于实例的软注意力掩模。用于东西类别的语义掩模和用于事情类别的软实例掩模在像素上是一致的，可以很容易地集成到全景分割输出中。训练和推理管道得到简化，无需对全景分割输出进行后处理。受益于快速推理速度，该网络可以部署在自动驾驶车辆或机器人应用中。我们在 COCO 和 Cityscapes 数据集上进行了广泛的实验，并在准确性和时间方面都取得了有竞争力的结果。在 Cityscapes 数据集上，我们在高分辨率 1024×2048 图像上实现了 59.7 的全景质量，推理速度超过 10 FPS。