ConvTransNet-S：一种用于复杂现场环境的卷积神经网络与Transformer混合疾病识别模型。

ConvTransNet-S: A CNN-Transformer Hybrid Disease Recognition Model for Complex Field Environments.

作者信息

Jia Shangyun, Wang Guanping, Li Hongling, Liu Yan, Shi Linrong, Yang Sen

机构信息

College of Mechanical and Electrical Engineering, Gansu Agricultural University, Lanzhou 730070, China.

出版信息

Plants (Basel). 2025 Jul 22;14(15):2252. doi: 10.3390/plants14152252.

DOI:10.3390/plants14152252

PMID:40805601

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12348439/

Abstract

To address the challenges of low recognition accuracy and substantial model complexity in crop disease identification models operating in complex field environments, this study proposed a novel hybrid model named ConvTransNet-S, which integrates Convolutional Neural Networks (CNNs) and transformers for crop disease identification tasks. Unlike existing hybrid approaches, ConvTransNet-S uniquely introduces three key innovations: First, a Local Perception Unit (LPU) and Lightweight Multi-Head Self-Attention (LMHSA) modules were introduced to synergistically enhance the extraction of fine-grained plant disease details and model global dependency relationships, respectively. Second, an Inverted Residual Feed-Forward Network (IRFFN) was employed to optimize the feature propagation path, thereby enhancing the model's robustness against interferences such as lighting variations and leaf occlusions. This novel combination of a LPU, LMHSA, and an IRFFN achieves a dynamic equilibrium between local texture perception and global context modeling-effectively resolving the trade-offs inherent in standalone CNNs or transformers. Finally, through a phased architecture design, efficient fusion of multi-scale disease features is achieved, which enhances feature discriminability while reducing model complexity. The experimental results indicated that ConvTransNet-S achieved a recognition accuracy of 98.85% on the PlantVillage public dataset. This model operates with only 25.14 million parameters, a computational load of 3.762 GFLOPs, and an inference time of 7.56 ms. Testing on a self-built in-field complex scene dataset comprising 10,441 images revealed that ConvTransNet-S achieved an accuracy of 88.53%, which represents improvements of 14.22%, 2.75%, and 0.34% over EfficientNetV2, Vision Transformer, and Swin Transformer, respectively. Furthermore, the ConvTransNet-S model achieved up to 14.22% higher disease recognition accuracy under complex background conditions while reducing the parameter count by 46.8%. This confirms that its unique multi-scale feature mechanism can effectively distinguish disease from background features, providing a novel technical approach for disease diagnosis in complex agricultural scenarios and demonstrating significant application value for intelligent agricultural management.

摘要

为应对在复杂田间环境中运行的作物病害识别模型存在的识别准确率低和模型复杂度高的挑战，本研究提出了一种名为ConvTransNet-S的新型混合模型，该模型将卷积神经网络（CNN）和变压器集成用于作物病害识别任务。与现有的混合方法不同，ConvTransNet-S独特地引入了三项关键创新：第一，引入局部感知单元（LPU）和轻量级多头自注意力（LMHSA）模块，分别协同增强细粒度植物病害细节的提取和模型全局依赖关系。第二，采用倒置残差前馈网络（IRFFN）优化特征传播路径，从而增强模型对光照变化和叶片遮挡等干扰的鲁棒性。LPU、LMHSA和IRFFN的这种新颖组合在局部纹理感知和全局上下文建模之间实现了动态平衡——有效地解决了独立CNN或变压器中固有的权衡问题。最后，通过分阶段架构设计，实现了多尺度病害特征的高效融合，提高了特征可辨别性，同时降低了模型复杂度。实验结果表明，ConvTransNet-S在PlantVillage公共数据集上的识别准确率达到了98.85%。该模型仅运行2514万个参数，计算负载为3.762 GFLOP，推理时间为7.56毫秒。在包含10441张图像的自建田间复杂场景数据集上进行测试表明，ConvTransNet-S的准确率达到了88.53%，分别比EfficientNetV2、视觉变压器和Swin变压器提高了14.22%、2.75%和0.34%。此外，ConvTransNet-S模型在复杂背景条件下的病害识别准确率提高了14.22%，同时参数数量减少了46.8%。这证实了其独特的多尺度特征机制可以有效地将病害与背景特征区分开来，为复杂农业场景中的病害诊断提供了一种新颖的技术方法，并展示了在智能农业管理中的重要应用价值。