CAEVT：用于高光谱图像分类的卷积自编码器与轻量级视觉转换器的结合

CAEVT: Convolutional Autoencoder Meets Lightweight Vision Transformer for Hyperspectral Image Classification.

机构信息

The State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China.

Beijing Institute for Advanced Study, National University of Defense Technology, Beijing 100020, China.

出版信息

Sensors (Basel). 2022 May 20;22(10):3902. doi: 10.3390/s22103902.

DOI:10.3390/s22103902

PMID:35632310

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9146051/

Abstract

Convolutional neural networks (CNNs) have been prominent in most hyperspectral image (HSI) processing applications due to their advantages in extracting local information. Despite their success, the locality of the convolutional layers within CNNs results in heavyweight models and time-consuming defects. In this study, inspired by the excellent performance of transformers that are used for long-range representation learning in computer vision tasks, we built a lightweight vision transformer for HSI classification that can extract local and global information simultaneously, thereby facilitating accurate classification. Moreover, as traditional dimensionality reduction methods are limited in their linear representation ability, a three-dimensional convolutional autoencoder was adopted to capture the nonlinear characteristics between spectral bands. Based on the aforementioned three-dimensional convolutional autoencoder and lightweight vision transformer, we designed an HSI classification network, namely the "convolutional autoencoder meets lightweight vision transformer" (CAEVT). Finally, we validated the performance of the proposed CAEVT network using four widely used hyperspectral datasets. Our approach showed superiority, especially in the absence of sufficient labeled samples, which demonstrates the effectiveness and efficiency of the CAEVT network.

摘要

卷积神经网络 (CNN) 在大多数高光谱图像 (HSI) 处理应用中表现突出，因为它们在提取局部信息方面具有优势。尽管它们取得了成功，但 CNN 中卷积层的局部性导致了模型的重量级和耗时的缺陷。在这项研究中，受用于计算机视觉任务中长程表示学习的变压器的优异性能的启发，我们为 HSI 分类构建了一个轻量级的视觉变压器，它可以同时提取局部和全局信息，从而实现准确的分类。此外，由于传统的降维方法在其线性表示能力方面存在局限性，因此采用了三维卷积自动编码器来捕获光谱波段之间的非线性特征。基于上述三维卷积自动编码器和轻量级视觉变压器，我们设计了一个 HSI 分类网络，即“卷积自动编码器与轻量级视觉变压器的结合”(CAEVT)。最后，我们使用四个广泛使用的高光谱数据集验证了所提出的 CAEVT 网络的性能。我们的方法表现出了优越性，特别是在缺乏足够的有标签样本的情况下，这证明了 CAEVT 网络的有效性和效率。