• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于Transformer且带有动态注意力金字塔头的甚高分辨率遥感影像语义分割模型

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery.

作者信息

Xu Yufen, Zhou Shangbo, Huang Yuhui

机构信息

College of Computer Science, Chongqing University, Chongqing 400044, China.

出版信息

Entropy (Basel). 2022 Nov 6;24(11):1619. doi: 10.3390/e24111619.

DOI:10.3390/e24111619
PMID:36359709
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9689728/
Abstract

Convolutional neural networks have long dominated semantic segmentation of very-high-resolution (VHR) remote sensing (RS) images. However, restricted by the fixed receptive field of convolution operation, convolution-based models cannot directly obtain contextual information. Meanwhile, Swin Transformer possesses great potential in modeling long-range dependencies. Nevertheless, Swin Transformer breaks images into patches that are single-dimension sequences without considering the position loss problem inside patches. Therefore, Inspired by Swin Transformer and Unet, we propose SUD-Net (Swin transformer-based Unet-like with Dynamic attention pyramid head Network), a new U-shaped architecture composed of Swin Transformer blocks and convolution layers simultaneously through a dual encoder and an upsampling decoder with a Dynamic Attention Pyramid Head (DAPH) attached to the backbone. First, we propose a dual encoder structure combining Swin Transformer blocks and reslayers in reverse order to complement global semantics with detailed representations. Second, aiming at the spatial loss problem inside each patch, we design a Multi-Path Fusion Model (MPFM) with specially devised Patch Attention (PA) to encode position information of patches and adaptively fuse features of different scales through attention mechanisms. Third, a Dynamic Attention Pyramid Head is constructed with deformable convolution to dynamically aggregate effective and important semantic information. SUD-Net achieves exceptional results on ISPRS Potsdam and Vaihingen datasets with 92.51%mF1, 86.4%mIoU, 92.98%OA, 89.49%mF1, 81.26%mIoU, and 90.95%OA, respectively.

摘要

卷积神经网络长期以来一直主导着超高分辨率(VHR)遥感(RS)图像的语义分割。然而,受卷积操作固定感受野的限制,基于卷积的模型无法直接获取上下文信息。同时,Swin Transformer在建模长距离依赖关系方面具有巨大潜力。然而,Swin Transformer将图像分割成单维序列的补丁,而没有考虑补丁内部的位置损失问题。因此,受Swin Transformer和Unet的启发,我们提出了SUD-Net(基于Swin Transformer的类Unet动态注意力金字塔头网络),这是一种新的U形架构,由Swin Transformer块和卷积层同时通过双编码器和上采样解码器组成,并在主干上附加了动态注意力金字塔头(DAPH)。首先,我们提出了一种双编码器结构,将Swin Transformer块和残差层以相反的顺序组合,以用详细表示补充全局语义。其次,针对每个补丁内部的空间损失问题,我们设计了一种多路径融合模型(MPFM),并特别设计了补丁注意力(PA),以编码补丁的位置信息,并通过注意力机制自适应地融合不同尺度的特征。第三,使用可变形卷积构建动态注意力金字塔头,以动态聚合有效和重要的语义信息。SUD-Net在ISPRS波茨坦和瓦辛根数据集上分别取得了优异的结果,mF1为92.51%,mIoU为86.4%,OA为92.98%,mF1为89.49%,mIoU为81.26%,OA为90.95%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/b6c2ca7fca5f/entropy-24-01619-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/8ca8352e04e9/entropy-24-01619-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/53298695341e/entropy-24-01619-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/bb6a230a08de/entropy-24-01619-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/b7e09b021279/entropy-24-01619-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/0cd7573796b7/entropy-24-01619-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/9bc5715ea05a/entropy-24-01619-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/b6c2ca7fca5f/entropy-24-01619-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/8ca8352e04e9/entropy-24-01619-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/53298695341e/entropy-24-01619-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/bb6a230a08de/entropy-24-01619-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/b7e09b021279/entropy-24-01619-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/0cd7573796b7/entropy-24-01619-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/9bc5715ea05a/entropy-24-01619-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4db/9689728/b6c2ca7fca5f/entropy-24-01619-g007.jpg

相似文献

1
Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery.基于Transformer且带有动态注意力金字塔头的甚高分辨率遥感影像语义分割模型
Entropy (Basel). 2022 Nov 6;24(11):1619. doi: 10.3390/e24111619.
2
TMNet: A Two-Branch Multi-Scale Semantic Segmentation Network for Remote Sensing Images.TMNet:一种用于遥感图像的两分支多尺度语义分割网络。
Sensors (Basel). 2023 Jun 26;23(13):5909. doi: 10.3390/s23135909.
3
CPFTransformer: transformer fusion context pyramid medical image segmentation network.CPFTransformer:变换器融合上下文金字塔医学图像分割网络。
Front Neurosci. 2023 Dec 7;17:1288366. doi: 10.3389/fnins.2023.1288366. eCollection 2023.
4
Land Cover Classification of UAV Remote Sensing Based on Transformer-CNN Hybrid Architecture.基于 Transformer-CNN 混合架构的无人机遥感土地覆盖分类。
Sensors (Basel). 2023 Jun 2;23(11):5288. doi: 10.3390/s23115288.
5
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
6
Efficient brain tumor segmentation using Swin transformer and enhanced local self-attention.基于 Swin Transformer 和增强型局部自注意力的高效脑肿瘤分割。
Int J Comput Assist Radiol Surg. 2024 Feb;19(2):273-281. doi: 10.1007/s11548-023-03024-8. Epub 2023 Oct 5.
7
A Siamese Swin-Unet for image change detection.用于图像变化检测的暹罗式Swin-Unet
Sci Rep. 2024 Feb 25;14(1):4577. doi: 10.1038/s41598-024-54096-8.
8
Swin-UNet++: A Nested Swin Transformer Architecture for Location Identification and Morphology Segmentation of Dimples on 2.25Cr1Mo0.25V Fractured Surface.Swin-UNet++:一种用于2.25Cr1Mo0.25V断口表面凹坑位置识别和形态分割的嵌套式Swin Transformer架构
Materials (Basel). 2021 Dec 7;14(24):7504. doi: 10.3390/ma14247504.
9
ETUNet:Exploring efficient transformer enhanced UNet for 3D brain tumor segmentation.ETUNet:探索高效的基于Transformer 的增强型 UNet 进行 3D 脑肿瘤分割。
Comput Biol Med. 2024 Mar;171:108005. doi: 10.1016/j.compbiomed.2024.108005. Epub 2024 Jan 23.
10
ST-Unet: Swin Transformer boosted U-Net with Cross-Layer Feature Enhancement for medical image segmentation.ST-Unet:具有跨层特征增强的 Swin Transformer 增强型 U-Net,用于医学图像分割。
Comput Biol Med. 2023 Feb;153:106516. doi: 10.1016/j.compbiomed.2022.106516. Epub 2023 Jan 6.

引用本文的文献

1
Fault diagnosis in electric motors using multi-mode time series and ensemble transformers network.基于多模式时间序列和集成变压器网络的电动机故障诊断
Sci Rep. 2025 Mar 6;15(1):7834. doi: 10.1038/s41598-025-89695-6.

本文引用的文献

1
Weakly Supervised Building Semantic Segmentation Based on Spot-Seeds and Refinement Process.基于点种子和细化过程的弱监督建筑语义分割
Entropy (Basel). 2022 May 23;24(5):741. doi: 10.3390/e24050741.
2
An Improved Encoder-Decoder Network Based on Strip Pool Method Applied to Segmentation of Farmland Vacancy Field.一种基于带状池化方法的改进编码器-解码器网络在农田空缺区域分割中的应用
Entropy (Basel). 2021 Apr 8;23(4):435. doi: 10.3390/e23040435.
3
Examining the impacts of future land use/land cover changes on climate in Punjab province, Pakistan: implications for environmental sustainability and economic growth.
探讨巴基斯坦旁遮普省未来土地利用/土地覆被变化对气候的影响:对环境可持续性和经济增长的启示。
Environ Sci Pollut Res Int. 2020 Jul;27(20):25415-25433. doi: 10.1007/s11356-020-08984-x. Epub 2020 Apr 29.