Zhang Pingping, Liu Wei, Lei Yinjie, Wang Hongyu, Lu Huchuan
IEEE Trans Image Process. 2020 Mar 10. doi: 10.1109/TIP.2020.2978339.
Street Scene Parsing (SSP) is a fundamental and important step for autonomous driving and traffic scene understanding. Recently, Fully Convolutional Network (FCN) based methods have delivered expressive performances with the help of large-scale dense-labeling datasets. However, in urban traffic environments, not all the labels contribute equally for making the control decision. Certain labels such as pedestrian, car, bicyclist, road lane or sidewalk would be more important in comparison with labels for vegetation, sky or building. Based on this fact, in this paper we propose a novel deep learning framework, named Residual Atrous Pyramid Network (RAPNet), for importance-aware SSP. More specifically, to incorporate the importance of various object classes, we propose an Importance-Aware Feature Selection (IAFS) mechanism which automatically selects the important features for label predictions. The IAFS can operate in each convolutional block, and the semantic features with different importance are captured in different channels so that they are automatically assigned with corresponding weights. To enhance the labeling coherence, we also propose a Residual Atrous Spatial Pyramid (RASP) module to sequentially aggregate global-to-local context information in a residual refinement manner. Extensive experiments on two public benchmarks have shown that our approach achieves new state-of-the-art performances, and can consistently obtain more accurate results on the semantic classes with high importance levels.
街景解析(SSP)是自动驾驶和交通场景理解的一个基础且重要的步骤。最近,基于全卷积网络(FCN)的方法借助大规模密集标注数据集取得了显著的性能表现。然而,在城市交通环境中,并非所有标签对做出控制决策的贡献都是等同的。与植被、天空或建筑物的标签相比,某些标签,如行人、汽车、骑自行车的人、道路车道或人行道,会更为重要。基于这一事实,在本文中我们提出了一种新颖的深度学习框架,名为残差空洞金字塔网络(RAPNet),用于重要性感知的街景解析。更具体地说,为了纳入各种对象类别的重要性,我们提出了一种重要性感知特征选择(IAFS)机制,该机制会自动为标签预测选择重要特征。IAFS可以在每个卷积块中运行,并且不同重要性的语义特征在不同通道中被捕获,以便它们被自动赋予相应的权重。为了增强标注的一致性,我们还提出了一个残差空洞空间金字塔(RASP)模块,以残差细化的方式顺序聚合全局到局部的上下文信息。在两个公共基准上进行的大量实验表明,我们的方法取得了新的最优性能,并且在具有高重要性级别的语义类别上能够始终如一地获得更准确的结果。