IEEE Trans Image Process. 2021;30:1866-1881. doi: 10.1109/TIP.2020.3048682. Epub 2021 Jan 18.
Semantic segmentation, unifying most navigational perception tasks at the pixel level has catalyzed striking progress in the field of autonomous transportation. Modern Convolution Neural Networks (CNNs) are able to perform semantic segmentation both efficiently and accurately, particularly owing to their exploitation of wide context information. However, most segmentation CNNs are benchmarked against pinhole images with limited Field of View (FoV). Despite the growing popularity of panoramic cameras to sense the surroundings, semantic segmenters have not been comprehensively evaluated on omnidirectional wide-FoV data, which features rich and distinct contextual information. In this paper, we propose a concurrent horizontal and vertical attention module to leverage width-wise and height-wise contextual priors markedly available in the panoramas. To yield semantic segmenters suitable for wide-FoV images, we present a multi-source omni-supervised learning scheme with panoramic domain covered in the training via data distillation. To facilitate the evaluation of contemporary CNNs in panoramic imagery, we put forward the Wild PAnoramic Semantic Segmentation (WildPASS) dataset, comprising images from all around the globe, as well as adverse and unconstrained scenes, which further reflects perception challenges of navigation applications in the real world. A comprehensive variety of experiments demonstrates that the proposed methods enable our high-efficiency architecture to attain significant accuracy gains, outperforming the state of the art in panoramic imagery domains.
语义分割在像素级别上统一了大多数导航感知任务,这极大地推动了自主运输领域的发展。现代卷积神经网络(CNN)能够高效准确地执行语义分割,这主要归功于它们对广泛上下文信息的利用。然而,大多数分割 CNN 都是针对视场有限的针孔图像进行基准测试的。尽管全景相机越来越受欢迎,可以用于感知周围环境,但全景宽视场数据上的语义分割器还没有得到全面评估,全景宽视场数据具有丰富而独特的上下文信息。在本文中,我们提出了一种水平和垂直注意力模块,以显著利用全景图像中的宽度和高度上下文先验。为了生成适用于宽视场图像的语义分割器,我们提出了一种多源全景监督学习方案,通过数据蒸馏在训练中涵盖全景域。为了促进当代 CNN 在全景图像中的评估,我们提出了 Wild PAnoramic Semantic Segmentation(WildPASS)数据集,其中包含来自全球各地的图像,以及恶劣和不受约束的场景,这进一步反映了导航应用在现实世界中的感知挑战。大量的实验表明,所提出的方法使我们的高效架构能够显著提高精度,在全景图像领域超越了现有技术。