Lu Weixiang, Yang Ying, Yang Lei
School of Computer, Electronics and Information, Guangxi University, Nanning, China.
Guangxi Academy of Sciences, Nanning, China.
Front Neurorobot. 2024 May 3;18:1391791. doi: 10.3389/fnbot.2024.1391791. eCollection 2024.
To efficiently capture feature information in tasks of fine-grained image classification, this study introduces a new network model for fine-grained image classification, which utilizes a hybrid attention approach. The model is built upon a hybrid attention module (MA), and with the assistance of the attention erasure module (EA), it can adaptively enhance the prominent areas in the image and capture more detailed image information. Specifically, for tasks involving fine-grained image classification, this study designs an attention module capable of applying the attention mechanism to both the channel and spatial dimensions. This highlights the important regions and key feature channels in the image, allowing for the extraction of distinct local features. Furthermore, this study presents an attention erasure module (EA) that can remove significant areas in the image based on the features identified; thus, shifting focus to additional feature details within the image and improving the diversity and completeness of the features. Moreover, this study enhances the pooling layer of ResNet50 to augment the perceptual region and the capability to extract features from the network's less deep layers. For the objective of fine-grained image classification, this study extracts a variety of features and merges them effectively to create the final feature representation. To assess the effectiveness of the proposed model, experiments were conducted on three publicly available fine-grained image classification datasets: Stanford Cars, FGVC-Aircraft, and CUB-200-2011. The method achieved classification accuracies of 92.8, 94.0, and 88.2% on these datasets, respectively. In comparison with existing approaches, the efficiency of this method has significantly improved, demonstrating higher accuracy and robustness.
为了在细粒度图像分类任务中高效地捕捉特征信息,本研究引入了一种用于细粒度图像分类的新型网络模型,该模型采用了混合注意力方法。该模型基于混合注意力模块(MA)构建,并在注意力擦除模块(EA)的辅助下,能够自适应地增强图像中的突出区域并捕捉更详细的图像信息。具体而言,对于涉及细粒度图像分类的任务,本研究设计了一种注意力模块,该模块能够将注意力机制应用于通道和空间维度。这突出了图像中的重要区域和关键特征通道,从而能够提取出独特的局部特征。此外,本研究提出了一种注意力擦除模块(EA),该模块可以根据识别出的特征去除图像中的显著区域;从而将注意力转移到图像中的其他特征细节上,提高特征的多样性和完整性。此外,本研究增强了ResNet50的池化层,以扩大感知区域并提高从网络较浅层提取特征的能力。为了实现细粒度图像分类的目标,本研究提取了多种特征并有效地进行合并,以创建最终的特征表示。为了评估所提出模型的有效性,在三个公开可用的细粒度图像分类数据集上进行了实验:斯坦福汽车、FGVC飞机和CUB-200-2011。该方法在这些数据集上分别实现了92.8%、94.0%和88.2%的分类准确率。与现有方法相比,该方法的效率有了显著提高,展现出更高的准确性和鲁棒性。