Suppr超能文献

BMSeNet:用于实时语义分割的多尺度上下文金字塔池化与空间细节增强网络

BMSeNet: Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network for Real-Time Semantic Segmentation.

作者信息

Zhao Shan, Zhao Xin, Huo Zhanqiang, Zhang Fukai

机构信息

School of Software, Henan Polytechnic University, Jiaozuo 454000, China.

出版信息

Sensors (Basel). 2024 Aug 9;24(16):5145. doi: 10.3390/s24165145.

Abstract

Most real-time semantic segmentation networks use shallow architectures to achieve fast inference speeds. This approach, however, limits a network's receptive field. Concurrently, feature information extraction is restricted to a single scale, which reduces the network's ability to generalize and maintain robustness. Furthermore, loss of image spatial details negatively impacts segmentation accuracy. To address these limitations, this paper proposes a Multiscale Context Pyramid Pooling and Spatial Detail Enhancement Network (BMSeNet). First, to address the limitation of singular semantic feature scales, a Multiscale Context Pyramid Pooling Module (MSCPPM) is introduced. By leveraging various pooling operations, this module efficiently enlarges the receptive field and better aggregates multiscale contextual information. Moreover, a Spatial Detail Enhancement Module (SDEM) is designed, to effectively compensate for lost spatial detail information and significantly enhance the perception of spatial details. Finally, a Bilateral Attention Fusion Module (BAFM) is proposed. This module leverages pixel positional correlations to guide the network in assigning appropriate weights to the features extracted from the two branches, effectively merging the feature information of both branches. Extensive experiments were conducted on the Cityscapes and CamVid datasets. Experimental results show that the proposed BMSeNet achieves a good balance between inference speed and segmentation accuracy, outperforming some state-of-the-art real-time semantic segmentation methods.

摘要

大多数实时语义分割网络使用浅层架构来实现快速推理速度。然而,这种方法限制了网络的感受野。同时,特征信息提取被限制在单一尺度,这降低了网络的泛化能力和稳健性。此外,图像空间细节的丢失对分割精度产生负面影响。为了解决这些限制,本文提出了一种多尺度上下文金字塔池化和空间细节增强网络(BMSeNet)。首先,为了解决单一语义特征尺度的限制,引入了多尺度上下文金字塔池化模块(MSCPPM)。通过利用各种池化操作,该模块有效地扩大了感受野,并更好地聚合了多尺度上下文信息。此外,设计了一个空间细节增强模块(SDEM),以有效补偿丢失的空间细节信息,并显著增强对空间细节的感知。最后,提出了一个双边注意力融合模块(BAFM)。该模块利用像素位置相关性来指导网络为从两个分支提取的特征分配适当的权重,有效地融合两个分支的特征信息。在Cityscapes和CamVid数据集上进行了大量实验。实验结果表明,所提出的BMSeNet在推理速度和分割精度之间取得了良好的平衡,优于一些现有的实时语义分割方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e3a/11360195/d6272a316219/sensors-24-05145-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验