Yuan Ying, Du Yu, Ma Yan, Lv Hejun
Beijing Key Laboratory of Information Service Engineering, College of Robotics, Beijing Union University, Beijing 100101, China.
Sensors (Basel). 2024 Sep 20;24(18):6075. doi: 10.3390/s24186075.
In modern urban environments, visual sensors are crucial for enhancing the functionality of navigation systems, particularly for devices designed for visually impaired individuals. The high-resolution images captured by these sensors form the basis for understanding the surrounding environment and identifying key landmarks. However, the core challenge in the semantic segmentation of blind roads lies in the effective extraction of global context and edge features. Most existing methods rely on Convolutional Neural Networks (CNNs), whose inherent inductive biases limit their ability to capture global context and accurately detect discontinuous features such as gaps and obstructions in blind roads. To overcome these limitations, we introduce Dual-Branch Swin-CNN Net(DSC-Net), a new method that integrates the global modeling capabilities of the Swin-Transformer with the CNN-based U-Net architecture. This combination allows for the hierarchical extraction of both fine and coarse features. First, the Spatial Blending Module (SBM) mitigates blurring of target information caused by object occlusion to enhance accuracy. The hybrid attention module (HAM), embedded within the Inverted Residual Module (IRM), sharpens the detection of blind road boundaries, while the IRM improves the speed of network processing. In tests on a specialized dataset designed for blind road semantic segmentation in real-world scenarios, our method achieved an impressive mIoU of 97.72%. Additionally, it demonstrated exceptional performance on other public datasets.
在现代城市环境中,视觉传感器对于增强导航系统的功能至关重要,特别是对于为视障人士设计的设备。这些传感器捕获的高分辨率图像构成了理解周围环境和识别关键地标的基础。然而,盲道语义分割的核心挑战在于有效提取全局上下文和边缘特征。大多数现有方法依赖于卷积神经网络(CNN),其固有的归纳偏差限制了它们捕获全局上下文以及准确检测盲道中间隙和障碍物等不连续特征的能力。为了克服这些限制,我们引入了双分支Swin-CNN网络(DSC-Net),这是一种将Swin-Transformer的全局建模能力与基于CNN的U-Net架构相结合的新方法。这种结合允许分层提取精细和粗糙特征。首先,空间融合模块(SBM)减轻了目标遮挡导致的目标信息模糊,以提高准确性。嵌入在倒置残差模块(IRM)中的混合注意力模块(HAM)增强了对盲道边界的检测,而IRM提高了网络处理速度。在针对现实场景中盲道语义分割设计的专门数据集上进行的测试中,我们的方法实现了令人印象深刻的97.72%的平均交并比(mIoU)。此外,它在其他公共数据集上也表现出色。