Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei City 106, Taiwan.
Sensors (Basel). 2021 Dec 2;21(23):8072. doi: 10.3390/s21238072.
As the techniques of autonomous driving become increasingly valued and universal, real-time semantic segmentation has become very popular and challenging in the field of deep learning and computer vision in recent years. However, in order to apply the deep learning model to edge devices accompanying sensors on vehicles, we need to design a structure that has the best trade-off between accuracy and inference time. In previous works, several methods sacrificed accuracy to obtain a faster inference time, while others aimed to find the best accuracy under the condition of real time. Nevertheless, the accuracies of previous real-time semantic segmentation methods still have a large gap compared to general semantic segmentation methods. As a result, we propose a network architecture based on a dual encoder and a self-attention mechanism. Compared with preceding works, we achieved a 78.6% mIoU with a speed of 39.4 FPS with a 1024 × 2048 resolution on a Cityscapes test submission.
随着自动驾驶技术的日益受到重视和普及,实时语义分割在近年来已成为深度学习和计算机视觉领域中非常热门且具有挑战性的课题。然而,为了将深度学习模型应用于搭载车辆传感器的边缘设备,我们需要设计一种在准确性和推理时间之间具有最佳权衡的结构。在之前的工作中,有几种方法为了获得更快的推理时间而牺牲了准确性,而另一些方法则旨在在实时条件下找到最佳的准确性。然而,与一般的语义分割方法相比,之前的实时语义分割方法的准确性仍存在较大差距。因此,我们提出了一种基于双编码器和自注意力机制的网络架构。与之前的工作相比,我们在 Cityscapes 测试提交中实现了 1024×2048 分辨率下 39.4 FPS 的速度,mIoU 达到了 78.6%。