Suppr超能文献

重新思考用于轻量级语义分割的一维卷积

Rethinking 1D convolution for lightweight semantic segmentation.

作者信息

Zhang Chunyu, Xu Fang, Wu Chengdong, Xu Chenglong

机构信息

Faculty of Robot Science and Engineering, Northeastern University, Shenyang, China.

Shenyang Siasun Robot & Automation Company Ltd., Shenyang, China.

出版信息

Front Neurorobot. 2023 Feb 9;17:1119231. doi: 10.3389/fnbot.2023.1119231. eCollection 2023.

Abstract

Lightweight semantic segmentation promotes the application of semantic segmentation in tiny devices. The existing lightweight semantic segmentation network (LSNet) has the problems of low precision and a large number of parameters. In response to the above problems, we designed a full 1D convolutional LSNet. The tremendous success of this network is attributed to the following three modules: 1D multi-layer space module (1D-MS), 1D multi-layer channel module (1D-MC), and flow alignment module (FA). The 1D-MS and the 1D-MC add global feature extraction operations based on the multi-layer perceptron (MLP) idea. This module uses 1D convolutional coding, which is more flexible than MLP. It increases the global information operation, improving features' coding ability. The FA module fuses high-level and low-level semantic information, which solves the problem of precision loss caused by the misalignment of features. We designed a 1D-mixer encoder based on the transformer structure. It performed fusion encoding of the feature space information extracted by the 1D-MS module and the channel information extracted by the 1D-MC module. 1D-mixer obtains high-quality encoded features with very few parameters, which is the key to the network's success. The attention pyramid with FA (AP-FA) uses an AP to decode features and adds a FA module to solve the problem of feature misalignment. Our network requires no pre-training and only needs a 1080Ti GPU for training. It achieved 72.6 mIoU and 95.6 FPS on the Cityscapes dataset and 70.5 mIoU and 122 FPS on the CamVid dataset. We ported the network trained on the ADE2K dataset to mobile devices, and the latency of 224 ms proves the application value of the network on mobile devices. The results on the three datasets prove that the network generalization ability we designed is powerful. Compared to state-of-the-art lightweight semantic segmentation algorithms, our designed network achieves the best balance between segmentation accuracy and parameters. The parameters of LSNet are only 0.62 M, which is currently the network with the highest segmentation accuracy within 1 M parameters.

摘要

轻量级语义分割促进了语义分割在微型设备中的应用。现有的轻量级语义分割网络(LSNet)存在精度低和参数数量多的问题。针对上述问题,我们设计了一种全一维卷积LSNet。该网络的巨大成功归功于以下三个模块:一维多层空间模块(1D-MS)、一维多层通道模块(1D-MC)和流对齐模块(FA)。1D-MS和1D-MC基于多层感知器(MLP)思想添加了全局特征提取操作。该模块使用一维卷积编码,比MLP更灵活。它增加了全局信息操作,提高了特征的编码能力。FA模块融合了高级和低级语义信息,解决了特征不对齐导致的精度损失问题。我们基于Transformer结构设计了一个一维混合编码器。它对1D-MS模块提取的特征空间信息和1D-MC模块提取的通道信息进行融合编码。一维混合器用很少的参数获得高质量的编码特征,这是网络成功的关键。带有FA的注意力金字塔(AP-FA)使用AP对特征进行解码,并添加一个FA模块来解决特征不对齐的问题。我们的网络无需预训练,仅需一块1080Ti GPU进行训练。它在Cityscapes数据集上实现了72.6 mIoU和95.6 FPS,在CamVid数据集上实现了70.5 mIoU和122 FPS。我们将在ADE2K数据集上训练的网络移植到移动设备上,224 ms的延迟证明了该网络在移动设备上的应用价值。在三个数据集上的结果证明了我们设计的网络泛化能力很强。与最先进的轻量级语义分割算法相比,我们设计的网络在分割精度和参数之间实现了最佳平衡。LSNet的参数仅为0.62 M,是目前1 M参数内分割精度最高的网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fb5/9947531/3cfb09214a9b/fnbot-17-1119231-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验