• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于知识学习引导的作物病害视觉问答模型

Informed-Learning-Guided Visual Question Answering Model of Crop Disease.

作者信息

Zhao Yunpeng, Wang Shansong, Zeng Qingtian, Ni Weijian, Duan Hua, Xie Nengfu, Xiao Fengjin

机构信息

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China.

Agricultural Information Institute of CAAS, Beijing 100081, China.

出版信息

Plant Phenomics. 2024 Dec 16;6:0277. doi: 10.34133/plantphenomics.0277. eCollection 2024.

DOI:10.34133/plantphenomics.0277
PMID:39687877
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11649200/
Abstract

In contemporary agriculture, experts develop preventative and remedial strategies for various disease stages in diverse crops. Decision-making regarding the stages of disease occurrence exceeds the capabilities of single-image tasks, such as image classification and object detection. Consequently, research now focuses on training visual question answering (VQA) models. However, existing studies concentrate on identifying disease species rather than formulating questions that encompass crucial multiattributes. Additionally, model performance is susceptible to the model structure and dataset biases. To address these challenges, we construct the informed-learning-guided VQA model of crop disease (ILCD). ILCD improves model performance by integrating coattention, a multimodal fusion model (MUTAN), and a bias-balancing (BiBa) strategy. To facilitate the investigation of various visual attributes of crop diseases and the determination of disease occurrence stages, we construct a new VQA dataset called the Crop Disease Multi-attribute VQA with Prior Knowledge (CDwPK-VQA). This dataset contains comprehensive information on various visual attributes such as shape, size, status, and color. We expand the dataset by integrating prior knowledge into CDwPK-VQA to address performance challenges. Comparative experiments are conducted by ILCD on the VQA-v2, VQA-CP v2, and CDwPK-VQA datasets, achieving accuracies of 68.90%, 49.75%, and 86.06%, respectively. Ablation experiments are conducted on CDwPK-VQA to evaluate the effectiveness of various modules, including coattention, MUTAN, and BiBa. These experiments demonstrate that ILCD exhibits the highest level of accuracy, performance, and value in the field of agriculture. The source codes can be accessed at https://github.com/SdustZYP/ILCD-master/tree/main.

摘要

在当代农业中,专家们针对不同作物的各个病害阶段制定预防和补救策略。关于病害发生阶段的决策超出了诸如图像分类和目标检测等单图像任务的能力范围。因此,目前的研究集中在训练视觉问答(VQA)模型上。然而,现有研究主要集中在识别病害种类上,而不是提出包含关键多属性的问题。此外,模型性能容易受到模型结构和数据集偏差的影响。为了应对这些挑战,我们构建了作物病害的信息学习引导VQA模型(ILCD)。ILCD通过整合协同注意力、多模态融合模型(MUTAN)和偏差平衡(BiBa)策略来提高模型性能。为了便于研究作物病害的各种视觉属性并确定病害发生阶段,我们构建了一个名为“具有先验知识的作物病害多属性VQA”(CDwPK-VQA)的新VQA数据集。该数据集包含有关形状、大小、状态和颜色等各种视觉属性的全面信息。我们通过将先验知识整合到CDwPK-VQA中来扩展数据集,以应对性能挑战。ILCD在VQA-v2、VQA-CP v2和CDwPK-VQA数据集上进行了对比实验,准确率分别达到了68.90%、49.75%和86.06%。在CDwPK-VQA上进行了消融实验,以评估包括协同注意力、MUTAN和BiBa在内的各种模块的有效性。这些实验表明,ILCD在农业领域展现出了最高水平的准确性、性能和价值。源代码可在https://github.com/SdustZYP/ILCD-master/tree/main上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/5b39c8ef216d/plantphenomics.0277.fig.010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/50f82c6a6616/plantphenomics.0277.fig.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/8a27e8f0b2b4/plantphenomics.0277.fig.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/b3aa4504f4a4/plantphenomics.0277.fig.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/9e452eb0c0af/plantphenomics.0277.fig.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/84f8e2890068/plantphenomics.0277.fig.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/1e44630eff4c/plantphenomics.0277.fig.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/d5366c2d593b/plantphenomics.0277.fig.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/fcbcb09ae5d3/plantphenomics.0277.fig.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/67d095e699bc/plantphenomics.0277.fig.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/5b39c8ef216d/plantphenomics.0277.fig.010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/50f82c6a6616/plantphenomics.0277.fig.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/8a27e8f0b2b4/plantphenomics.0277.fig.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/b3aa4504f4a4/plantphenomics.0277.fig.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/9e452eb0c0af/plantphenomics.0277.fig.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/84f8e2890068/plantphenomics.0277.fig.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/1e44630eff4c/plantphenomics.0277.fig.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/d5366c2d593b/plantphenomics.0277.fig.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/fcbcb09ae5d3/plantphenomics.0277.fig.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/67d095e699bc/plantphenomics.0277.fig.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c85f/11649200/5b39c8ef216d/plantphenomics.0277.fig.010.jpg

相似文献

1
Informed-Learning-Guided Visual Question Answering Model of Crop Disease.基于知识学习引导的作物病害视觉问答模型
Plant Phenomics. 2024 Dec 16;6:0277. doi: 10.34133/plantphenomics.0277. eCollection 2024.
2
Counterfactual Dual-Bias VQA: A Multimodality Debias Learning for Robust Visual Question Answering.反事实双偏差视觉问答:一种用于稳健视觉问答的多模态去偏差学习方法
IEEE Trans Neural Netw Learn Syst. 2025 Sep;36(9):16366-16378. doi: 10.1109/TNNLS.2025.3562085.
3
Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering.用于减轻视觉问答中语言偏差的协作模态融合
J Imaging. 2024 Feb 23;10(3):56. doi: 10.3390/jimaging10030056.
4
Interpretable medical image Visual Question Answering via multi-modal relationship graph learning.基于多模态关系图学习的可解释医学图像视觉问答。
Med Image Anal. 2024 Oct;97:103279. doi: 10.1016/j.media.2024.103279. Epub 2024 Jul 20.
5
Advancing surgical VQA with scene graph knowledge.利用场景图知识推进外科视觉问答。
Int J Comput Assist Radiol Surg. 2024 Jul;19(7):1409-1417. doi: 10.1007/s11548-024-03141-y. Epub 2024 May 23.
6
Robust visual question answering via polarity enhancement and contrast.通过极性增强和对比实现鲁棒的视觉问答。
Neural Netw. 2024 Nov;179:106560. doi: 10.1016/j.neunet.2024.106560. Epub 2024 Jul 20.
7
Rich Visual Knowledge-Based Augmentation Network for Visual Question Answering.用于视觉问答的基于丰富视觉知识的增强网络
IEEE Trans Neural Netw Learn Syst. 2021 Oct;32(10):4362-4373. doi: 10.1109/TNNLS.2020.3017530. Epub 2021 Oct 5.
8
Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering.用于医学视觉问答的多目标跨模态自监督视觉语言预训练
J Biomed Inform. 2024 Dec;160:104748. doi: 10.1016/j.jbi.2024.104748. Epub 2024 Nov 12.
9
Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA.视觉问答中的跨模态偏差:基于可能世界视觉问答的因果观点
IEEE Trans Multimedia. 2024;26:8609-8624. doi: 10.1109/tmm.2024.3380259. Epub 2024 Mar 21.
10
Multitask Learning for Visual Question Answering.用于视觉问答的多任务学习
IEEE Trans Neural Netw Learn Syst. 2023 Mar;34(3):1380-1394. doi: 10.1109/TNNLS.2021.3105284. Epub 2023 Feb 28.

本文引用的文献

1
Study on the Optimal Leaf Area-to-Fruit Ratio of Pear Trees on the Basis of Bearing Branch Girdling and Machine Learning.基于结果枝环剥和机器学习的梨树最佳叶果比研究
Plant Phenomics. 2024 Aug 14;6:0233. doi: 10.34133/plantphenomics.0233. eCollection 2024.
2
Medical visual question answering: A survey.医学视觉问答:综述。
Artif Intell Med. 2023 Sep;143:102611. doi: 10.1016/j.artmed.2023.102611. Epub 2023 Jun 8.
3
Visual Intelligence in Precision Agriculture: Exploring Plant Disease Detection via Efficient Vision Transformers.
精准农业中的视觉智能:通过高效视觉Transformer探索植物病害检测
Sensors (Basel). 2023 Aug 4;23(15):6949. doi: 10.3390/s23156949.
4
Image-Based Phenotyping for Non-Destructive In Situ Rice ( L.) Tiller Counting Using Proximal Sensing.基于图像的表型分析在利用近地感知进行非破坏性原位水稻(L.)分蘖计数中的应用。
Sensors (Basel). 2022 Jul 25;22(15):5547. doi: 10.3390/s22155547.
5
Loss Re-Scaling VQA: Revisiting the Language Prior Problem From a Class-Imbalance View.损失重缩放视觉问答:从类别不平衡视角重新审视语言先验问题
IEEE Trans Image Process. 2022;31:227-238. doi: 10.1109/TIP.2021.3128322. Epub 2021 Dec 7.
6
MIDCAN: A multiple input deep convolutional attention network for Covid-19 diagnosis based on chest CT and chest X-ray.MIDCAN:一种基于胸部CT和胸部X光的用于新冠病毒肺炎诊断的多输入深度卷积注意力网络。
Pattern Recognit Lett. 2021 Oct;150:8-16. doi: 10.1016/j.patrec.2021.06.021. Epub 2021 Jul 14.
7
The global burden of pathogens and pests on major food crops.主要粮食作物的病原体和害虫的全球负担。
Nat Ecol Evol. 2019 Mar;3(3):430-439. doi: 10.1038/s41559-018-0793-y. Epub 2019 Feb 4.
8
A miniature integrated multimodal sensor for measuring pH, EC and temperature for precision agriculture.一种用于精准农业的测量 pH 值、电导率和温度的微型集成多模态传感器。
Sensors (Basel). 2012;12(6):8338-54. doi: 10.3390/s120608338. Epub 2012 Jun 15.
9
A multiscale retinex for bridging the gap between color images and the human observation of scenes.一种多尺度反射率模型,用于弥合彩色图像与人对场景的观察之间的差距。
IEEE Trans Image Process. 1997;6(7):965-76. doi: 10.1109/83.597272.
10
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.