通过群组注意力实现高光谱图像分类的群组Former

GroupFormer for hyperspectral image classification through group attention.

作者信息

Khan Rahim, Arshad Tahir, Ma Xuefei, Zhu Haifeng, Wang Chen, Khan Javed, Khan Zahid Ullah, Khan Sajid Ullah

机构信息

College of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China.

School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, 150001, China.

出版信息

Sci Rep. 2024 Oct 12;14(1):23879. doi: 10.1038/s41598-024-74835-1.

DOI:10.1038/s41598-024-74835-1

PMID:39396096

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11470927/

Abstract

Hyperspectral image (HSI) data has a wide range of valuable spectral information for numerous tasks. HSI data encounters challenges such as small training samples, scarcity, and redundant information. Researchers have introduced various research works to address these challenges. Convolution Neural Network (CNN) has gained significant success in the field of HSI classification. CNN's primary focus is to extract low-level features from HSI data, and it has a limited ability to detect long-range dependencies due to the confined filter size. In contrast, vision transformers exhibit great success in the HSI classification field due to the use of attention mechanisms to learn the long-range dependencies. As mentioned earlier, the primary issue with these models is that they require sufficient labeled training data. To address this challenge, we proposed a spectral-spatial feature extractor group attention transformer that consists of a multiscale feature extractor to extract low-level or shallow features. For high-level semantic feature extraction, we proposed a group attention mechanism. Our proposed model is evaluated using four publicly available HSI datasets, which are Indian Pines, Pavia University, Salinas, and the KSC dataset. Our proposed approach achieved the best classification results in terms of overall accuracy (OA), average accuracy (AA), and Kappa coefficient. As mentioned earlier, the proposed approach utilized only 5%, 1%, 1%, and 10% of the training samples from the publicly available four datasets.

摘要

高光谱图像（HSI）数据在众多任务中具有广泛的宝贵光谱信息。HSI数据面临诸如训练样本少、稀缺和信息冗余等挑战。研究人员引入了各种研究工作来应对这些挑战。卷积神经网络（CNN）在HSI分类领域取得了显著成功。CNN的主要重点是从HSI数据中提取低级特征，并且由于其受限的滤波器大小，它检测长距离依赖关系的能力有限。相比之下，视觉Transformer由于使用注意力机制来学习长距离依赖关系，在HSI分类领域表现出巨大的成功。如前所述，这些模型的主要问题是它们需要足够的标记训练数据。为了应对这一挑战，我们提出了一种光谱-空间特征提取器组注意力Transformer，它由一个多尺度特征提取器组成，用于提取低级或浅层特征。对于高级语义特征提取，我们提出了一种组注意力机制。我们提出的模型使用四个公开可用的HSI数据集进行评估，即印第安纳松树数据集、帕维亚大学数据集、萨利纳斯数据集和肯尼迪航天中心（KSC）数据集。我们提出的方法在总体准确率（OA）、平均准确率（AA）和卡帕系数方面取得了最佳分类结果。如前所述，所提出的方法仅使用了来自公开可用的四个数据集的5%、1%、1%和10%的训练样本。