Suppr超能文献

一种用于作物病虫害多模态识别的大语言模型。

A large language model for multimodal identification of crop diseases and pests.

作者信息

Wang Yiqun, Wang Fahai, Chen Wenbai, Lv Bowen, Liu Mengchen, Kong Xiangyuan, Zhao Chunjiang, Pan Zhaocen

机构信息

School of Automation, Beijing Information Science and Technology University, Beijing, 100192, China.

Beijing Research Center for Information Technology in Agriculture, Beijing, 100097, China.

出版信息

Sci Rep. 2025 Jul 1;15(1):21959. doi: 10.1038/s41598-025-01908-0.

Abstract

Pests and diseases significantly impact the growth and development of crops. When attempting to precisely identify disease characteristics in crop images through dialogue, existing multimodal models face numerous challenges, often leading to misinterpretation and incorrect feedback regarding disease information. This paper proposed a large language model for multimodal identification of crop diseases and pests, which can be called LLMI-CDP. It builds up on the VisualGLM model and introduces improvements to achieve precise identification of agricultural crop disease and pest images, along with providing professional recommendations for relevant preventive measures. The use of Low-Rank Adaptation (LoRA) technology, which adjusts the weights of pre-trained models, achieves significant performance improvements with a minimal increase in parameters. This ensures the precise capture and efficient identification of crop pest and disease characteristics, greatly enhancing the model's application flexibility and accuracy in the field of pest and disease recognition. Simultaneously, the model incorporates the Q-Former framework for effective modal alignment between language models and image features. Through this approach, the LLMI-CDP model is able to more deeply understand and process the complex relationships between language and visual information, further enhancing its performance in multimodal recognition tasks. Experiments are carried out in the homemade datasets, The results demonstrate that the LLMI-CDP model surpasses five leading multimodal large language models in relevant evaluation metrics, confirming its outstanding performance in Chinese multimodal dialogues related to agriculture.

摘要

病虫害对农作物的生长发育有重大影响。在试图通过对话精确识别作物图像中的病害特征时,现有的多模态模型面临诸多挑战,常常导致对病害信息的误解和错误反馈。本文提出了一种用于农作物病虫害多模态识别的大语言模型,可称为LLMI-CDP。它基于VisualGLM模型构建,并进行了改进,以实现对农作物病虫害图像的精确识别,并为相关预防措施提供专业建议。使用低秩自适应(LoRA)技术来调整预训练模型的权重,在参数增加极少的情况下实现了显著的性能提升。这确保了对作物病虫害特征的精确捕捉和高效识别,极大地提高了模型在病虫害识别领域的应用灵活性和准确性。同时,该模型纳入了Q-Former框架,以实现语言模型与图像特征之间的有效模态对齐。通过这种方法,LLMI-CDP模型能够更深入地理解和处理语言与视觉信息之间的复杂关系,进一步提升其在多模态识别任务中的性能。在自制数据集上进行了实验,结果表明LLMI-CDP模型在相关评估指标上超越了五个领先的多模态大语言模型,证实了其在与农业相关的中文多模态对话中的卓越表现。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验