基于跨尺度融合注意力机制网络的街景语义分割

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes.

作者信息

Ye Xin, Gao Lang, Chen Jichen, Lei Mingyue

机构信息

Institute of Artificial Intelligence and Data Science, Xi'an Technological University, Xi'an, China.

Computer Part III, Xi'an Microelectronics Technology Institute, Xi'an, China.

出版信息

Front Neurorobot. 2023 Aug 31;17:1204418. doi: 10.3389/fnbot.2023.1204418. eCollection 2023.

DOI:10.3389/fnbot.2023.1204418

PMID:37719330

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10501793/

Abstract

Semantic segmentation, which is a fundamental task in computer vision. Every pixel will have a specific semantic class assigned to it through semantic segmentation methods. Embedded systems and mobile devices are difficult to deploy high-accuracy segmentation algorithms. Despite the rapid development of semantic segmentation, the balance between speed and accuracy must be improved. As a solution to the above problems, we created a cross-scale fusion attention mechanism network called CFANet, which fuses feature maps from different scales. We first design a novel efficient residual module (ERM), which applies both dilation convolution and factorized convolution. Our CFANet is mainly constructed from ERM. Subsequently, we designed a new multi-branch channel attention mechanism (MCAM) to refine the feature maps at different levels. Experiment results show that CFANet achieved 70.6% mean intersection over union (mIoU) and 67.7% mIoU on Cityscapes and CamVid datasets, respectively, with inference speeds of 118 FPS and 105 FPS on NVIDIA RTX2080Ti GPU cards with 0.84M parameters.

摘要

语义分割是计算机视觉中的一项基础任务。通过语义分割方法，每个像素都会被赋予一个特定的语义类别。嵌入式系统和移动设备难以部署高精度的分割算法。尽管语义分割发展迅速，但速度和准确性之间的平衡仍有待提高。作为上述问题的解决方案，我们创建了一个名为CFANet的跨尺度融合注意力机制网络，它融合了来自不同尺度的特征图。我们首先设计了一种新颖的高效残差模块（ERM），它同时应用了空洞卷积和分组卷积。我们的CFANet主要由ERM构建而成。随后，我们设计了一种新的多分支通道注意力机制（MCAM）来细化不同层级的特征图。实验结果表明，CFANet在Cityscapes和CamVid数据集上分别实现了70.6%的平均交并比（mIoU）和67.7%的mIoU，在配备0.84M参数的NVIDIA RTX2080Ti GPU卡上推理速度分别为118 FPS和105 FPS。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d85/10501793/41e975269738/fnbot-17-1204418-g0001.jpg

相似文献

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes.

Front Neurorobot. 2023 Aug 31;17:1204418. doi: 10.3389/fnbot.2023.1204418. eCollection 2023.

Rethinking 1D convolution for lightweight semantic segmentation.

Front Neurorobot. 2023 Feb 9;17:1119231. doi: 10.3389/fnbot.2023.1119231. eCollection 2023.

A lightweight multi-dimension dynamic convolutional network for real-time semantic segmentation.

Front Neurorobot. 2022 Dec 15;16:1075520. doi: 10.3389/fnbot.2022.1075520. eCollection 2022.

A Fast Attention-Guided Hierarchical Decoding Network for Real-Time Semantic Segmentation.

Sensors (Basel). 2023 Dec 24;24(1):95. doi: 10.3390/s24010095.

MFAFNet: A Lightweight and Efficient Network with Multi-Level Feature Adaptive Fusion for Real-Time Semantic Segmentation.

Sensors (Basel). 2023 Jul 13;23(14):6382. doi: 10.3390/s23146382.

Faster SCDNet: Real-Time Semantic Segmentation Network with Split Connection and Flexible Dilated Convolution.

Sensors (Basel). 2023 Mar 14;23(6):3112. doi: 10.3390/s23063112.

An ENet Semantic Segmentation Method Combined with Attention Mechanism.

Comput Intell Neurosci. 2023 Feb 22;2023:6965259. doi: 10.1155/2023/6965259. eCollection 2023.

Performance estimation for the memristor-based computing-in-memory implementation of extremely factorized network for real-time and low-power semantic segmentation.

Neural Netw. 2023 Mar;160:202-215. doi: 10.1016/j.neunet.2023.01.008. Epub 2023 Jan 13.

Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation.

Neural Netw. 2021 May;137:188-199. doi: 10.1016/j.neunet.2021.01.021. Epub 2021 Jan 30.

FGCN: Image-Fused Point Cloud Semantic Segmentation with Fusion Graph Convolutional Network.

Sensors (Basel). 2023 Oct 9;23(19):8338. doi: 10.3390/s23198338.

引用本文的文献

Cascade contour-enhanced panoptic segmentation for robotic vision perception.

Front Neurorobot. 2024 Oct 21;18:1489021. doi: 10.3389/fnbot.2024.1489021. eCollection 2024.

本文引用的文献

CGNet: A Light-Weight Context Guided Network for Semantic Segmentation.

IEEE Trans Image Process. 2021;30:1169-1179. doi: 10.1109/TIP.2020.3042065. Epub 2020 Dec 17.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.

IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848. doi: 10.1109/TPAMI.2017.2699184. Epub 2017 Apr 27.

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615. Epub 2017 Jan 2.

Fully Convolutional Networks for Semantic Segmentation.

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):640-651. doi: 10.1109/TPAMI.2016.2572683. Epub 2016 May 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于跨尺度融合注意力机制网络的街景语义分割

Based on cross-scale fusion attention mechanism network for semantic segmentation for street scenes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献