利用具有动态令牌化的清晰注意力变换器在高光谱图像分类中的潜力。

Leveraging potential of limpid attention transformer with dynamic tokenization for hyperspectral image classification.

作者信息

Yadav Dhirendra Prasad, Kumar Deepak, Jalal Anand Singh, Sharma Bhisham, Liatsis Panos

机构信息

Department of Computer Engineering & Applications, G.L.A. University, Mathurar, Uttar Pradesh, India.

Department of Computer Engineering, NIT Meghalaya, Shillong, Meghalaya, India.

出版信息

PLoS One. 2025 Aug 4;20(8):e0328160. doi: 10.1371/journal.pone.0328160. eCollection 2025.

DOI:10.1371/journal.pone.0328160

PMID:40758709

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12321143/

Abstract

Hyperspectral data consists of continuous narrow spectral bands. Due to this, it has less spatial and high spectral information. Convolutional neural networks (CNNs) emerge as a highly contextual information model for remote sensing applications. Unfortunately, CNNs have constraints in their underlying network architecture in regards to the global correlation of spatial and spectral features, making them less reliable for mining and representing the sequential properties of spectral signatures. In this article, limpid size attention network (LSANet) is proposed, which contains 3D and 2D convolution blocks for enhancement of spatial-spectral features of the hyperspectral image (HSI). In addition, limpid attention block (LAB) is designed to provide a global correlation of the spectral and spatial features through LS attention. Furthermore, the computational costs of LS-attention are less compared to the multi-head self-attention (MHSA) of the classical vision transformer (ViT). In the ViT encoder a conditional position encoding (CPE) module is utilized that dynamically generates tokens from the feature maps to capture a richer contextual representation. The LSANet obtained overall accuracy (OA) of 98.78%, 98.67%, 97.52% and 89.45%, respectively, on the Indian Pines (IP), Pavia University (PU), Salina Valley (SV) and Botswana datasets. Our model's quantitative and qualitative results are considerably better than the classical CNN and transformer-based methods.

摘要

高光谱数据由连续的窄光谱带组成。因此，它具有较少的空间信息和较高的光谱信息。卷积神经网络（CNN）作为一种用于遥感应用的高度上下文信息模型而出现。不幸的是，CNN在其底层网络架构中在空间和光谱特征的全局相关性方面存在限制，这使得它们在挖掘和表示光谱特征的顺序属性时不太可靠。在本文中，提出了清晰尺寸注意力网络（LSANet），它包含3D和2D卷积块，用于增强高光谱图像（HSI）的空间 - 光谱特征。此外，设计了清晰注意力块（LAB），以通过LS注意力提供光谱和空间特征的全局相关性。此外，与经典视觉Transformer（ViT）的多头自注意力（MHSA）相比，LS注意力的计算成本更低。在ViT编码器中，使用了条件位置编码（CPE）模块，该模块从特征图动态生成令牌以捕获更丰富的上下文表示。LSANet在印度松树（IP）、帕维亚大学（PU）、萨利纳山谷（SV）和博茨瓦纳数据集上分别获得了98.78%、98.67%、97.52%和89.45%的总体准确率（OA）。我们模型的定量和定性结果明显优于经典的基于CNN和Transformer的方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用具有动态令牌化的清晰注意力变换器在高光谱图像分类中的潜力。

Leveraging potential of limpid attention transformer with dynamic tokenization for hyperspectral image classification.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

利用具有动态令牌化的清晰注意力变换器在高光谱图像分类中的潜力。

Leveraging potential of limpid attention transformer with dynamic tokenization for hyperspectral image classification.

作者信息

机构信息

出版信息

相似文献

本文引用的文献