School of Life Sciences, Northeast Agricultural University, Harbin, China.
State Key Laboratory of Membrane Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
Sci Rep. 2023 Feb 20;13(1):2966. doi: 10.1038/s41598-023-29600-1.
Insect pest recognition has always been a significant branch of agriculture and ecology. The slight variance among different kinds of insects in appearance makes it hard for human experts to recognize. It is increasingly imperative to finely recognize specific insects by employing machine learning methods. In this study, we proposed a feature fusion network to synthesize feature presentations in different backbone models. Firstly, we employed one CNN-based backbone ResNet, and two attention-based backbones Vision Transformer and Swin Transformer to localize the important regions of insect images with Grad-CAM. During this process, we designed new architectures for these two Transformers to enable Grad-CAM to be applicable in such attention-based models. Then we further proposed an attention-selection mechanism to reconstruct the attention area by delicately integrating the important regions, enabling these partial but key expressions to complement each other. We only need part of the image scope that represents the most crucial decision-making information for insect recognition. We randomly selected 20 species of insects from the IP102 dataset and then adopted all 102 kinds of insects to test the classification performance. Experimental results show that the proposed approach outperforms other advanced CNN-based models. More importantly, our attention-selection mechanism demonstrates good robustness to augmented images.
昆虫识别一直是农业和生态学的重要分支。不同种类的昆虫在外观上的细微差异使得人类专家难以识别。因此,采用机器学习方法对特定昆虫进行精细识别变得越来越必要。在这项研究中,我们提出了一种特征融合网络,以综合不同骨干模型中的特征表示。首先,我们使用了一个基于卷积神经网络的骨干 ResNet,以及两个基于注意力的骨干 Vision Transformer 和 Swin Transformer,通过 Grad-CAM 定位昆虫图像的重要区域。在这个过程中,我们为这两个 Transformer 设计了新的架构,以使 Grad-CAM 能够适用于这些基于注意力的模型。然后,我们进一步提出了一种注意力选择机制,通过精细地整合重要区域来重建注意力区域,使这些局部但关键的表达能够相互补充。我们只需要图像范围的一部分,这些部分代表了昆虫识别中最关键的决策信息。我们从 IP102 数据集随机选择了 20 种昆虫,然后用所有 102 种昆虫来测试分类性能。实验结果表明,所提出的方法优于其他先进的基于卷积神经网络的模型。更重要的是,我们的注意力选择机制对增强图像具有良好的鲁棒性。