Suppr超能文献

HAFormer:释放层次感知特征在轻量级语义分割中的力量

HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation.

作者信息

Xu Guoan, Jia Wenjing, Wu Tao, Chen Ligeng, Gao Guangwei

出版信息

IEEE Trans Image Process. 2024;33:4202-4214. doi: 10.1109/TIP.2024.3425048. Epub 2024 Jul 22.

Abstract

Both Convolutional Neural Networks (CNNs) and Transformers have shown great success in semantic segmentation tasks. Efforts have been made to integrate CNNs with Transformer models to capture both local and global context interactions. However, there is still room for enhancement, particularly when considering constraints on computational resources. In this paper, we introduce HAFormer, a model that combines the hierarchical features extraction ability of CNNs with the global dependency modeling capability of Transformers to tackle lightweight semantic segmentation challenges. Specifically, we design a Hierarchy-Aware Pixel-Excitation (HAPE) module for adaptive multi-scale local feature extraction. During the global perception modeling, we devise an Efficient Transformer (ET) module streamlining the quadratic calculations associated with traditional Transformers. Moreover, a correlation-weighted Fusion (cwF) module selectively merges diverse feature representations, significantly enhancing predictive accuracy. HAFormer achieves high performance with minimal computational overhead and compact model size, achieving 74.2% mIoU on Cityscapes and 71.1% mIoU on CamVid test datasets, with frame rates of 105FPS and 118FPS on a single 2080Ti GPU. The source codes are available at https://github.com/XU-GITHUB-curry/HAFormer.

摘要

卷积神经网络(CNNs)和Transformer在语义分割任务中都取得了巨大成功。人们已努力将CNNs与Transformer模型集成,以捕捉局部和全局上下文交互。然而,仍有改进空间,特别是考虑到计算资源的限制时。在本文中,我们介绍了HAFormer,一种将CNNs的分层特征提取能力与Transformer的全局依赖性建模能力相结合的模型,以应对轻量级语义分割挑战。具体而言,我们设计了一个层次感知像素激励(HAPE)模块用于自适应多尺度局部特征提取。在全局感知建模期间,我们设计了一个高效Transformer(ET)模块,简化了与传统Transformer相关的二次计算。此外,一个相关加权融合(cwF)模块选择性地合并不同的特征表示,显著提高预测准确性。HAFormer以最小的计算开销和紧凑的模型大小实现了高性能,在Cityscapes测试数据集上达到了74.2%的平均交并比(mIoU),在CamVid测试数据集上达到了71.1%的mIoU,在单个2080Ti GPU上的帧率分别为105FPS和118FPS。源代码可在https://github.com/XU-GITHUB-curry/HAFormer获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验