Yang Zhengeng, Yu Hongshan, Feng Mingtao, Sun Wei, Lin Xuefei, Sun Mingui, Mao Zhi-Hong, Mian Ajmal
IEEE Trans Image Process. 2020 Mar 18. doi: 10.1109/TIP.2020.2976856.
Semantic segmentation is a key step in scene understanding for autonomous driving. Although deep learning has significantly improved the segmentation accuracy, current highquality models such as PSPNet and DeepLabV3 are inefficient given their complex architectures and reliance on multi-scale inputs. Thus, it is difficult to apply them to real-time or practical applications. On the other hand, existing real-time methods cannot yet produce satisfactory results on small objects such as traffic lights, which are imperative to safe autonomous driving. In this paper, we improve the performance of real-time semantic segmentation from two perspectives, methodology and data. Specifically, we propose a real-time segmentation model coined Narrow Deep Network (NDNet) and build a synthetic dataset by inserting additional small objects into the training images. The proposed method achieves 65.7% mean intersection over union (mIoU) on the Cityscapes test set with only 8.4G floatingpoint operations (FLOPs) on 1024×2048 inputs. Furthermore, by re-training the existing PSPNet and DeepLabV3 models on our synthetic dataset, we obtained an average 2% mIoU improvement on small objects.
语义分割是自动驾驶场景理解中的关键步骤。尽管深度学习显著提高了分割精度,但当前诸如PSPNet和DeepLabV3等高质量模型由于其复杂的架构以及对多尺度输入的依赖,效率较低。因此,将它们应用于实时或实际应用很困难。另一方面,现有的实时方法在诸如交通信号灯等小物体上仍无法产生令人满意的结果,而这些小物体对于安全自动驾驶至关重要。在本文中,我们从方法和数据两个角度提高实时语义分割的性能。具体而言,我们提出了一种名为窄深度网络(NDNet)的实时分割模型,并通过在训练图像中插入额外的小物体来构建一个合成数据集。所提出的方法在Cityscapes测试集上实现了65.7%的平均交并比(mIoU),在1024×2048输入上仅需8.4G浮点运算(FLOPs)。此外,通过在我们的合成数据集上重新训练现有的PSPNet和DeepLabV3模型,我们在小物体上平均提高了2%的mIoU。