Suppr超能文献

基于视觉语言模型的跨模态数据融合用于作物病害识别

Cross-Modal Data Fusion via Vision-Language Model for Crop Disease Recognition.

作者信息

Liu Wenjie, Wu Guoqing, Wang Han, Ren Fuji

机构信息

School of Transportation and Civil Engineering, Nantong University, Nantong 226019, China.

School of Mechanical Engineering, Nantong Institute of Technology, Nantong 226002, China.

出版信息

Sensors (Basel). 2025 Jun 30;25(13):4096. doi: 10.3390/s25134096.

Abstract

Crop diseases pose a significant threat to agricultural productivity and global food security. Timely and accurate disease identification is crucial for improving crop yield and quality. While most existing deep learning-based methods focus primarily on image datasets for disease recognition, they often overlook the complementary role of textual features in enhancing visual understanding. To address this problem, we proposed a cross-modal data fusion via a vision-language model for crop disease recognition. Our approach leverages the Zhipu.ai multi-model to generate comprehensive textual descriptions of crop leaf diseases, including global description, local lesion description, and color-texture description. These descriptions are encoded into feature vectors, while an image encoder extracts image features. A cross-attention mechanism then iteratively fuses multimodal features across multiple layers, and a classification prediction module generates classification probabilities. Extensive experiments on the Soybean Disease, AI Challenge 2018, and PlantVillage datasets demonstrate that our method outperforms state-of-the-art image-only approaches with higher accuracy and fewer parameters. Specifically, with only 1.14M model parameters, our model achieves a 98.74%, 87.64% and 99.08% recognition accuracy on the three datasets, respectively. The results highlight the effectiveness of cross-modal learning in leveraging both visual and textual cues for precise and efficient disease recognition, offering a scalable solution for crop disease recognition.

摘要

作物病害对农业生产力和全球粮食安全构成重大威胁。及时准确的病害识别对于提高作物产量和质量至关重要。虽然大多数现有的基于深度学习的方法主要侧重于用于病害识别的图像数据集,但它们往往忽视了文本特征在增强视觉理解方面的补充作用。为了解决这个问题,我们提出了一种通过视觉语言模型进行跨模态数据融合的作物病害识别方法。我们的方法利用智谱AI多模型生成作物叶片病害的全面文本描述,包括全局描述、局部病斑描述和颜色纹理描述。这些描述被编码为特征向量,同时图像编码器提取图像特征。然后,交叉注意力机制在多个层上迭代融合多模态特征,分类预测模块生成分类概率。在大豆病害数据集、2018年人工智能挑战赛数据集和植物村数据集上进行的大量实验表明,我们的方法以更高的准确率和更少的参数优于现有的仅基于图像的方法。具体而言,我们的模型仅具有114万个模型参数,在这三个数据集上分别实现了98.74%、87.64%和99.08%的识别准确率。结果突出了跨模态学习在利用视觉和文本线索进行精确高效病害识别方面的有效性,为作物病害识别提供了一种可扩展的解决方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a76/12251865/75099bf90839/sensors-25-04096-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验