Suppr超能文献

UNeXt:一种用于高分辨率遥感影像语义分割的高效网络。

UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images.

作者信息

Chang Zhanyuan, Xu Mingyu, Wei Yuwen, Lian Jie, Zhang Chongming, Li Chuanjiang

机构信息

College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China.

出版信息

Sensors (Basel). 2024 Oct 16;24(20):6655. doi: 10.3390/s24206655.

Abstract

The application of deep neural networks for the semantic segmentation of remote sensing images is a significant research area within the field of the intelligent interpretation of remote sensing data. The semantic segmentation of remote sensing images holds great practical value in urban planning, disaster assessment, the estimation of carbon sinks, and other related fields. With the continuous advancement of remote sensing technology, the spatial resolution of remote sensing images is gradually increasing. This increase in resolution brings about challenges such as significant changes in the scale of ground objects, redundant information, and irregular shapes within remote sensing images. Current methods leverage Transformers to capture global long-range dependencies. However, the use of Transformers introduces higher computational complexity and is prone to losing local details. In this paper, we propose UNeXt (UNet+ConvNeXt+Transformer), a real-time semantic segmentation model tailored for high-resolution remote sensing images. To achieve efficient segmentation, UNeXt uses the lightweight ConvNeXt-T as the encoder and a lightweight decoder, Transnext, which combines a Transformer and CNN (Convolutional Neural Networks) to capture global information while avoiding the loss of local details. Furthermore, in order to more effectively utilize spatial and channel information, we propose a SCFB (SC Feature Fuse Block) to reduce computational complexity while enhancing the model's recognition of complex scenes. A series of ablation experiments and comprehensive comparative experiments demonstrate that our method not only runs faster than state-of-the-art (SOTA) lightweight models but also achieves higher accuracy. Specifically, our proposed UNeXt achieves 85.2% and 82.9% mIoUs on the Vaihingen and Gaofen5 (GID5) datasets, respectively, while maintaining 97 fps for 512 × 512 inputs on a single NVIDIA GTX 4090 GPU, outperforming other SOTA methods.

摘要

深度神经网络在遥感影像语义分割中的应用是遥感数据智能解译领域的一个重要研究方向。遥感影像语义分割在城市规划、灾害评估、碳汇估算等相关领域具有重要的实用价值。随着遥感技术的不断发展,遥感影像的空间分辨率逐渐提高。分辨率的提高给遥感影像带来了诸如地物尺度变化显著、信息冗余以及形状不规则等挑战。当前方法利用Transformer来捕捉全局长距离依赖关系。然而,Transformer的使用带来了更高的计算复杂度,并且容易丢失局部细节。在本文中,我们提出了UNeXt(UNet + ConvNeXt + Transformer),这是一种专门为高分辨率遥感影像设计的实时语义分割模型。为了实现高效分割,UNeXt使用轻量级的ConvNeXt - T作为编码器和一个轻量级解码器Transnext,它将Transformer和CNN(卷积神经网络)相结合,以捕捉全局信息,同时避免局部细节的丢失。此外,为了更有效地利用空间和通道信息,我们提出了一种SCFB(SC特征融合块)来降低计算复杂度,同时增强模型对复杂场景的识别能力。一系列消融实验和综合对比实验表明,我们的方法不仅比现有最先进的(SOTA)轻量级模型运行速度更快,而且精度更高。具体而言,我们提出的UNeXt在Vaihingen和高分五号(GID5)数据集上分别达到了85.2%和82.9%的平均交并比(mIoU),同时在单个NVIDIA GTX 4090 GPU上对于512×512输入保持97帧每秒的速度,优于其他SOTA方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0274/11510939/3f3c058a2f0b/sensors-24-06655-g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验