Zeng Qingtian, Sun Jian, Wang Shansong
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China.
Front Plant Sci. 2024 Jan 25;14:1273029. doi: 10.3389/fpls.2023.1273029. eCollection 2023.
Disease image classification systems play a crucial role in identifying disease categories in the field of agricultural diseases. However, current plant disease image classification methods can only predict the disease category and do not offer explanations for the characteristics of the predicted disease images. Due to the current situation, this paper employed image description generation technology to produce distinct descriptions for different plant disease categories. A two-stage model called DIC-Transformer, which encompasses three tasks (detection, interpretation, and classification), was proposed. In the first stage, Faster R-CNN was utilized to detect the diseased area and generate the feature vector of the diseased image, with the Swin Transformer as the backbone. In the second stage, the model utilized the Transformer to generate image captions. It then generated the image feature vector, which is weighted by text features, to improve the performance of image classification in the subsequent classification decoder. Additionally, a dataset containing text and visualizations for agricultural diseases (ADCG-18) was compiled. The dataset contains images of 18 diseases and descriptive information about their characteristics. Then, using the ADCG-18, the DIC-Transformer was compared to 11 existing classical caption generation methods and 10 image classification models. The evaluation indicators for captions include Bleu1-4, CiderD, and Rouge. The values of BLEU-1, CIDEr-D, and ROUGE were 0.756, 450.51, and 0.721. The results of DIC-Transformer were 0.01, 29.55, and 0.014 higher than those of the highest-performing comparison model, Fc. The classification evaluation metrics include accuracy, recall, and F1 score, with accuracy at 0.854, recall at 0.854, and F1 score at 0.853. The results of DIC-Transformer were 0.024, 0.078, and 0.075 higher than those of the highest-performing comparison model, MobileNetV2. The results indicate that the DIC-Transformer outperforms other comparison models in classification and caption generation.
病害图像分类系统在农业病害领域的病害类别识别中起着至关重要的作用。然而,当前的植物病害图像分类方法只能预测病害类别,无法对预测的病害图像特征做出解释。鉴于此现状,本文采用图像描述生成技术为不同的植物病害类别生成独特的描述。提出了一种名为DIC-Transformer的两阶段模型,该模型包含三个任务(检测、解释和分类)。在第一阶段,以Swin Transformer为骨干网络,利用Faster R-CNN检测患病区域并生成患病图像的特征向量。在第二阶段,模型利用Transformer生成图像字幕。然后生成由文本特征加权的图像特征向量,以提高后续分类解码器中图像分类的性能。此外,还编制了一个包含农业病害文本和可视化内容的数据集(ADCG-18)。该数据集包含18种病害的图像及其特征描述信息。然后,使用ADCG-18将DIC-Transformer与11种现有的经典字幕生成方法和10种图像分类模型进行比较。字幕的评估指标包括Bleu1-4、CiderD和Rouge。BLEU-1、CIDEr-D和ROUGE的值分别为0.756、450.51和0.721。DIC-Transformer的结果比表现最佳的比较模型Fc分别高出0.01、29.55和0.014。分类评估指标包括准确率、召回率和F1分数,准确率为0.854,召回率为0.854,F1分数为0.853。DIC-Transformer的结果比表现最佳的比较模型MobileNetV2分别高出0.024、0.078和0.075。结果表明,DIC-Transformer在分类和字幕生成方面优于其他比较模型。