Electronic Information School, Wuhan University, Wuhan, 430072, China.
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China.
Neural Netw. 2021 May;137:188-199. doi: 10.1016/j.neunet.2021.01.021. Epub 2021 Jan 30.
The encoder-decoder structure has been introduced into semantic segmentation to improve the spatial accuracy of the network by fusing high- and low-level feature maps. However, recent state-of-the-art encoder-decoder-based methods can hardly attain the real-time requirement due to their complex and inefficient decoders. To address this issue, in this paper, we propose a lightweight bilateral attention decoder for real-time semantic segmentation. It consists of two blocks and can fuse different level feature maps via two steps, i.e., information refinement and information fusion. In the first step, we propose a channel attention branch to refine the high-level feature maps and a spatial attention branch for the low-level ones. The refined high-level feature maps can capture more exact semantic information and the refined low-level ones can capture more accurate spatial information, which significantly improves the information capturing ability of these feature maps. In the second step, we develop a new fusion module named pooling fusing block to fuse the refined high- and low-level feature maps. This fusion block can take full advantages of the high- and low-level feature maps, leading to high-quality fusion results. To verify the efficiency of the proposed bilateral attention decoder, we adopt a lightweight network as the backbone and compare our proposed method with other state-of-the-art real-time semantic segmentation methods on the Cityscapes and Camvid datasets. Experimental results demonstrate that our proposed method can achieve better performance with a higher inference speed. Moreover, we compare our proposed network with several state-of-the-art non-real-time semantic segmentation methods and find that our proposed network can also attain better segmentation performance.
编解码器结构已被引入语义分割中,通过融合高低层特征图来提高网络的空间精度。然而,由于其复杂和低效的解码器,最近的最先进的基于编解码器的方法很难满足实时要求。为了解决这个问题,本文提出了一种用于实时语义分割的轻量级双边注意解码器。它由两个块组成,可以通过两个步骤融合不同层次的特征图,即信息细化和信息融合。在第一步中,我们提出了一个通道注意力分支来细化高层特征图和一个空间注意力分支来细化低层特征图。细化后的高层特征图可以捕捉更准确的语义信息,细化后的低层特征图可以捕捉更准确的空间信息,这显著提高了这些特征图的信息捕捉能力。在第二步中,我们开发了一个名为池化融合块的新融合模块来融合细化后的高低层特征图。这个融合块可以充分利用高低层特征图,从而得到高质量的融合结果。为了验证所提出的双边注意解码器的效率,我们采用一个轻量级网络作为骨干,并在 Cityscapes 和 Camvid 数据集上与其他最先进的实时语义分割方法进行比较。实验结果表明,所提出的方法可以在更高的推理速度下获得更好的性能。此外,我们将所提出的网络与几个最先进的非实时语义分割方法进行比较,发现我们提出的网络也可以获得更好的分割性能。