Suppr超能文献

基于知识学习引导的作物病害视觉问答模型

Informed-Learning-Guided Visual Question Answering Model of Crop Disease.

作者信息

Zhao Yunpeng, Wang Shansong, Zeng Qingtian, Ni Weijian, Duan Hua, Xie Nengfu, Xiao Fengjin

机构信息

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China.

Agricultural Information Institute of CAAS, Beijing 100081, China.

出版信息

Plant Phenomics. 2024 Dec 16;6:0277. doi: 10.34133/plantphenomics.0277. eCollection 2024.

Abstract

In contemporary agriculture, experts develop preventative and remedial strategies for various disease stages in diverse crops. Decision-making regarding the stages of disease occurrence exceeds the capabilities of single-image tasks, such as image classification and object detection. Consequently, research now focuses on training visual question answering (VQA) models. However, existing studies concentrate on identifying disease species rather than formulating questions that encompass crucial multiattributes. Additionally, model performance is susceptible to the model structure and dataset biases. To address these challenges, we construct the informed-learning-guided VQA model of crop disease (ILCD). ILCD improves model performance by integrating coattention, a multimodal fusion model (MUTAN), and a bias-balancing (BiBa) strategy. To facilitate the investigation of various visual attributes of crop diseases and the determination of disease occurrence stages, we construct a new VQA dataset called the Crop Disease Multi-attribute VQA with Prior Knowledge (CDwPK-VQA). This dataset contains comprehensive information on various visual attributes such as shape, size, status, and color. We expand the dataset by integrating prior knowledge into CDwPK-VQA to address performance challenges. Comparative experiments are conducted by ILCD on the VQA-v2, VQA-CP v2, and CDwPK-VQA datasets, achieving accuracies of 68.90%, 49.75%, and 86.06%, respectively. Ablation experiments are conducted on CDwPK-VQA to evaluate the effectiveness of various modules, including coattention, MUTAN, and BiBa. These experiments demonstrate that ILCD exhibits the highest level of accuracy, performance, and value in the field of agriculture. The source codes can be accessed at https://github.com/SdustZYP/ILCD-master/tree/main.

摘要

在当代农业中,专家们针对不同作物的各个病害阶段制定预防和补救策略。关于病害发生阶段的决策超出了诸如图像分类和目标检测等单图像任务的能力范围。因此,目前的研究集中在训练视觉问答(VQA)模型上。然而,现有研究主要集中在识别病害种类上,而不是提出包含关键多属性的问题。此外,模型性能容易受到模型结构和数据集偏差的影响。为了应对这些挑战,我们构建了作物病害的信息学习引导VQA模型(ILCD)。ILCD通过整合协同注意力、多模态融合模型(MUTAN)和偏差平衡(BiBa)策略来提高模型性能。为了便于研究作物病害的各种视觉属性并确定病害发生阶段,我们构建了一个名为“具有先验知识的作物病害多属性VQA”(CDwPK-VQA)的新VQA数据集。该数据集包含有关形状、大小、状态和颜色等各种视觉属性的全面信息。我们通过将先验知识整合到CDwPK-VQA中来扩展数据集,以应对性能挑战。ILCD在VQA-v2、VQA-CP v2和CDwPK-VQA数据集上进行了对比实验,准确率分别达到了68.90%、49.75%和86.06%。在CDwPK-VQA上进行了消融实验,以评估包括协同注意力、MUTAN和BiBa在内的各种模块的有效性。这些实验表明,ILCD在农业领域展现出了最高水平的准确性、性能和价值。源代码可在https://github.com/SdustZYP/ILCD-master/tree/main上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/50f82c6a6616/plantphenomics.0277.fig.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验