Hu Xuegang, Feng Jing
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
Chongqing Key Laboratory of Signal and Information Processing, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
Sensors (Basel). 2023 Dec 24;24(1):95. doi: 10.3390/s24010095.
Semantic segmentation provides accurate scene understanding and decision support for many applications. However, many models strive for high accuracy by adopting complex structures, decreasing the inference speed, and making it challenging to meet real-time requirements. Therefore, a fast attention-guided hierarchical decoding network for real-time semantic segmentation (FAHDNet), which is an asymmetric U-shaped structure, is proposed to address this issue. In the encoder, we design a multi-scale bottleneck residual unit (MBRU), which combines the attention mechanism and decomposition convolution to design a parallel structure for aggregating multi-scale information, making the network perform better at processing information at different scales. In addition, we propose a spatial information compensation (SIC) module that effectively uses the original input to make up for the spatial texture information lost during downsampling. In the decoder, the global attention (GA) module is used to process the feature map of the encoder, enhance the feature interaction in the channel and spatial dimensions, and enhance the ability to mine feature information. At the same time, the lightweight hierarchical decoder integrates multi-scale features to better adapt to different scale targets and accurately segment objects of different sizes. Through experiments, FAHDNet performs outstandingly on two public datasets, Cityscapes and Camvid. Specifically, the network achieves 70.6% mean intersection over union (mIoU) at 135 frames per second (FPS) on Cityscapes and 67.2% mIoU at 335 FPS on Camvid. Compared to the existing networks, our model maintains accuracy while achieving faster inference speeds, thus enhancing its practical usability.
语义分割为许多应用提供了准确的场景理解和决策支持。然而,许多模型通过采用复杂的结构来追求高精度,这降低了推理速度,使得满足实时需求具有挑战性。因此,提出了一种用于实时语义分割的快速注意力引导分层解码网络(FAHDNet),它是一种不对称的U形结构,以解决这个问题。在编码器中,我们设计了一种多尺度瓶颈残差单元(MBRU),它将注意力机制和分解卷积相结合,设计了一种用于聚合多尺度信息的并行结构,使网络在处理不同尺度的信息时表现更好。此外,我们提出了一种空间信息补偿(SIC)模块,该模块有效地利用原始输入来弥补下采样过程中丢失的空间纹理信息。在解码器中,全局注意力(GA)模块用于处理编码器的特征图,增强通道和空间维度上的特征交互,并增强挖掘特征信息的能力。同时,轻量级分层解码器集成多尺度特征,以更好地适应不同尺度的目标,并准确分割不同大小的物体。通过实验,FAHDNet在两个公共数据集Cityscapes和Camvid上表现出色。具体来说,该网络在Cityscapes上以每秒135帧(FPS)的速度实现了70.6%的平均交并比(mIoU),在Camvid上以335 FPS的速度实现了67.2%的mIoU。与现有网络相比,我们的模型在保持精度的同时实现了更快的推理速度,从而提高了其实际可用性。