• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于视觉-语言模型的作物病害少样本图像分类。

Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models.

机构信息

School of Information Engineering, China University of Geosciences, Beijing 100083, China.

State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

Sensors (Basel). 2024 Sep 21;24(18):6109. doi: 10.3390/s24186109.

DOI:10.3390/s24186109
PMID:39338855
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11435512/
Abstract

Accurate crop disease classification is crucial for ensuring food security and enhancing agricultural productivity. However, the existing crop disease classification algorithms primarily focus on a single image modality and typically require a large number of samples. Our research counters these issues by using pre-trained Vision-Language Models (VLMs), which enhance the multimodal synergy for better crop disease classification than the traditional unimodal approaches. Firstly, we apply the multimodal model Qwen-VL to generate meticulous textual descriptions for representative disease images selected through clustering from the training set, which will serve as prompt text for generating classifier weights. Compared to solely using the language model for prompt text generation, this approach better captures and conveys fine-grained and image-specific information, thereby enhancing the prompt quality. Secondly, we integrate cross-attention and SE (Squeeze-and-Excitation) Attention into the training-free mode VLCD(Vision-Language model for Crop Disease classification) and the training-required mode VLCD-T (VLCD-Training), respectively, for prompt text processing, enhancing the classifier weights by emphasizing the key text features. The experimental outcomes conclusively prove our method's heightened classification effectiveness in few-shot crop disease scenarios, tackling the data limitations and intricate disease recognition issues. It offers a pragmatic tool for agricultural pathology and reinforces the smart farming surveillance infrastructure.

摘要

准确的作物病害分类对于确保粮食安全和提高农业生产力至关重要。然而,现有的作物病害分类算法主要集中在单一的图像模态上,通常需要大量的样本。我们的研究通过使用预训练的视觉语言模型(VLMs)来解决这些问题,这些模型增强了多模态协同作用,比传统的单模态方法更能实现更好的作物病害分类。首先,我们应用多模态模型 Qwen-VL 从训练集中通过聚类选择代表性的病害图像,并生成细致的文本描述,这些描述将作为生成分类器权重的提示文本。与仅使用语言模型生成提示文本相比,这种方法更好地捕捉和传达了细粒度和图像特定的信息,从而提高了提示的质量。其次,我们分别将交叉注意力和 SE(Squeeze-and-Excitation)注意力集成到无训练模式 VLCD(用于作物病害分类的视觉语言模型)和有训练模式 VLCD-T(VLCD-Training)中,用于提示文本处理,通过强调关键文本特征来增强分类器权重。实验结果明确证明了我们的方法在少样本作物病害情况下的分类效果有所提高,解决了数据限制和复杂病害识别的问题。它为农业病理学提供了一个实用的工具,并增强了智能农业监测基础设施。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/b9dd086f8682/sensors-24-06109-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/125008ac94cf/sensors-24-06109-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/faf5d721daea/sensors-24-06109-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/6c240c3798ff/sensors-24-06109-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/480e11093d21/sensors-24-06109-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/cfa4b321d0a6/sensors-24-06109-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/d46a3e6f2fd4/sensors-24-06109-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/a1e90e0db99b/sensors-24-06109-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/b9dd086f8682/sensors-24-06109-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/125008ac94cf/sensors-24-06109-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/faf5d721daea/sensors-24-06109-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/6c240c3798ff/sensors-24-06109-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/480e11093d21/sensors-24-06109-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/cfa4b321d0a6/sensors-24-06109-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/d46a3e6f2fd4/sensors-24-06109-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/a1e90e0db99b/sensors-24-06109-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6c/11435512/b9dd086f8682/sensors-24-06109-g008.jpg

相似文献

1
Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models.基于视觉-语言模型的作物病害少样本图像分类。
Sensors (Basel). 2024 Sep 21;24(18):6109. doi: 10.3390/s24186109.
2
Efficient agricultural pest classification using vision transformer with hybrid pooled multihead attention.利用融合池多头注意力的视觉转换器实现高效农业虫害分类。
Comput Biol Med. 2024 Jul;177:108584. doi: 10.1016/j.compbiomed.2024.108584. Epub 2024 May 13.
3
Prompt-guided and multimodal landscape scenicness assessments with vision-language models.基于提示的多模态景观美景评估与视觉语言模型。
PLoS One. 2024 Sep 30;19(9):e0307083. doi: 10.1371/journal.pone.0307083. eCollection 2024.
4
Few-shot disease recognition algorithm based on supervised contrastive learning.基于监督对比学习的少样本疾病识别算法
Front Plant Sci. 2024 Feb 7;15:1341831. doi: 10.3389/fpls.2024.1341831. eCollection 2024.
5
Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.使用暹罗神经网络的临床自然语言处理少样本学习:算法开发与验证研究
JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.
6
Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.Proto-Adapter:用于少样本图像分类的高效无需训练的CLIP-Adapter
Sensors (Basel). 2024 Jun 4;24(11):3624. doi: 10.3390/s24113624.
7
The impact of fine-tuning paradigms on unknown plant diseases recognition.微调范式对未知植物病害识别的影响。
Sci Rep. 2024 Aug 2;14(1):17900. doi: 10.1038/s41598-024-66958-2.
8
Multimodal representations of biomedical knowledge from limited training whole slide images and reports using deep learning.利用深度学习从有限的训练全切片图像和报告中获取生物医学知识的多模态表示。
Med Image Anal. 2024 Oct;97:103303. doi: 10.1016/j.media.2024.103303. Epub 2024 Aug 14.
9
IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models.IQAGPT:使用视觉语言模型和ChatGPT模型进行计算机断层扫描图像质量评估
Vis Comput Ind Biomed Art. 2024 Aug 5;7(1):20. doi: 10.1186/s42492-024-00171-w.
10
Enhancing agriculture through real-time grape leaf disease classification via an edge device with a lightweight CNN architecture and Grad-CAM.利用边缘设备上的轻量级卷积神经网络架构和 Grad-CAM 进行实时葡萄叶疾病分类,以提高农业水平。
Sci Rep. 2024 Jul 11;14(1):16022. doi: 10.1038/s41598-024-66989-9.

引用本文的文献

1
VIPS: Learning-View-Invariant Feature for Person Search.VIPS:用于行人搜索的学习视图不变特征
Sensors (Basel). 2025 Aug 29;25(17):5362. doi: 10.3390/s25175362.
2
Foundation models and intelligent decision-making: Progress, challenges, and perspectives.基础模型与智能决策:进展、挑战与展望
Innovation (Camb). 2025 May 12;6(6):100948. doi: 10.1016/j.xinn.2025.100948. eCollection 2025 Jun 2.
3
An Unbiased Feature Estimation Network for Few-Shot Fine-Grained Image Classification.用于少样本细粒度图像分类的无偏特征估计网络

本文引用的文献

1
Vision-Language Models for Vision Tasks: A Survey.用于视觉任务的视觉语言模型:一项综述。
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5625-5644. doi: 10.1109/TPAMI.2024.3369699. Epub 2024 Jul 2.
2
An Interpretable High-Accuracy Method for Rice Disease Detection Based on Multisource Data and Transfer Learning.一种基于多源数据和迁移学习的可解释性水稻病害高精度检测方法。
Plants (Basel). 2023 Sep 15;12(18):3273. doi: 10.3390/plants12183273.
3
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.统一卷积与自注意力机制用于视觉识别的UniFormer
Sensors (Basel). 2024 Dec 3;24(23):7737. doi: 10.3390/s24237737.
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12581-12600. doi: 10.1109/TPAMI.2023.3282631. Epub 2023 Sep 5.
4
Semi-supervised few-shot learning approach for plant diseases recognition.用于植物病害识别的半监督少样本学习方法。
Plant Methods. 2021 Jun 27;17(1):68. doi: 10.1186/s13007-021-00770-1.