Guo Zhiyang, Hu Xing, Wang Jiejia, Miao XiaoYu, Sun MengTeng, Wang HuaiWei, Ma XueYing
School of Traffic Engineering, Jiangsu Shipping College, Nantong, China.
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science & Technology, Shanghai, 200093, China.
Sci Rep. 2024 Jul 29;14(1):17438. doi: 10.1038/s41598-024-68255-4.
Detecting roads in automatic driving environments poses a challenge due to issues such as boundary fuzziness, occlusion, and glare from light. We believe that two factors are instrumental in addressing these challenges and enhancing detection performance: global context dependency and effective feature representation that prioritizes important feature channels. To tackle these issues, we introduce DTRoadseg, a novel duplex Transformer-based heterogeneous feature fusion network designed for road segmentation. DTRoadseg leverages a duplex encoder architecture to extract heterogeneous features from both RGB images and point-cloud depth images. Subsequently, we introduce a multi-source Heterogeneous Feature Reinforcement Block (HFRB) for fusion of the encoded features, comprising a Heterogeneous Feature Fusion Module (HFFM) and a Reinforcement Fusion Module (RFM). The HFFM leverages the self-attention mechanisms of Transformers to achieve effective fusion through token interactions, while the RFM focuses on emphasizing informative features while downplaying less important ones, thereby reinforcing feature fusion. Finally, a Transformer decoder is utilized to produce the final semantic prediction. Furthermore, we employ a boundary loss function to optimize the segmentation structure area, reduce false detection areas, and improve model accuracy. Extensive experiments are carried out on the KITTI road dataset. The results demonstrate that, compared with state-of-the-art methods, DTRoadseg exhibits superior performance, achieving an average accuracy of 97.01%, a Recall of 96.35%, and runs at a speed of 0.09 s per picture.
在自动驾驶环境中检测道路具有挑战性,因为存在边界模糊、遮挡和光线眩光等问题。我们认为有两个因素有助于应对这些挑战并提高检测性能:全局上下文依赖性和优先考虑重要特征通道的有效特征表示。为了解决这些问题,我们引入了DTRoadseg,这是一种基于双工Transformer的新型异构特征融合网络,专为道路分割而设计。DTRoadseg利用双工编码器架构从RGB图像和点云深度图像中提取异构特征。随后,我们引入了一个多源异构特征增强块(HFRB)用于融合编码后的特征,它由一个异构特征融合模块(HFFM)和一个增强融合模块(RFM)组成。HFFM利用Transformer的自注意力机制通过令牌交互实现有效融合,而RFM则专注于强调信息丰富的特征,同时淡化不太重要的特征,从而加强特征融合。最后,使用Transformer解码器生成最终的语义预测。此外,我们采用边界损失函数来优化分割结构区域,减少误检区域,并提高模型精度。我们在KITTI道路数据集上进行了大量实验。结果表明,与现有方法相比,DTRoadseg表现出卓越的性能,平均准确率达到97.01%,召回率为96.35%,每张图片的运行速度为0.09秒。