Li Ruixiang, Wang Zhen, Guo Jianxin, Zhang Chuanlei
School of Electronic Information, Xijing University, Xijing Road, Chang'an District, Xi'an 710123, China.
School of Computer Science, Northwestern Polytechnical University, Dongxiang Road, Chang'an District, Xi'an 710129, China.
J Imaging. 2025 Jun 9;11(6):188. doi: 10.3390/jimaging11060188.
Semantic segmentation plays a critical role in understanding complex urban environments, particularly for autonomous driving applications. However, existing approaches face significant challenges under low-light and adverse weather conditions. To address these limitations, we propose CSANet (Context Spatial Awareness Network), a novel framework that effectively integrates RGB and thermal infrared (TIR) modalities. CSANet employs an efficient encoder to extract complementary local and global features, while a hierarchical fusion strategy is adopted to selectively integrate visual and semantic information. Notably, the Channel-Spatial Cross-Fusion Module (CSCFM) enhances local details by fusing multi-modal features, and the Multi-Head Fusion Module (MHFM) captures global dependencies and calibrates multi-modal information. Furthermore, the Spatial Coordinate Attention Mechanism (SCAM) improves object localization accuracy in complex urban scenes. Evaluations on benchmark datasets (MFNet and PST900) demonstrate that CSANet achieves state-of-the-art performance, significantly advancing RGB-T semantic segmentation.
语义分割在理解复杂的城市环境中起着关键作用,特别是对于自动驾驶应用而言。然而,现有方法在低光照和恶劣天气条件下面临重大挑战。为了解决这些局限性,我们提出了CSANet(上下文空间感知网络),这是一个有效集成RGB和热红外(TIR)模态的新颖框架。CSANet采用高效编码器来提取互补的局部和全局特征,同时采用分层融合策略来选择性地整合视觉和语义信息。值得注意的是,通道-空间交叉融合模块(CSCFM)通过融合多模态特征增强局部细节,多头融合模块(MHFM)捕捉全局依赖性并校准多模态信息。此外,空间坐标注意力机制(SCAM)提高了复杂城市场景中的目标定位精度。在基准数据集(MFNet和PST900)上的评估表明,CSANet实现了领先的性能,显著推动了RGB-T语义分割的发展。