Meng Qingduan, Guo Jiadong, Zhang Hui, Zhou Yaoqi, Zhang Xiaoling
College of Information Engineering, Henan University of Science and Technology, Luoyang, Henan, China.
PLoS One. 2025 Apr 24;20(4):e0321753. doi: 10.1371/journal.pone.0321753. eCollection 2025.
Computer vision holds tremendous potential in crop disease classification, but the complex texture and shape characteristics of crop diseases make disease classification challenging. To address these issues, this paper proposes a dual-branch model for crop disease classification, which combines Convolutional Neural Network (CNN) with Vision Transformer (ViT). Here, the convolutional branch is utilized to capture the local features while the Transformer branch is utilized to handle global features. A learnable parameter is used to achieve a linear weighted fusion of these two types of features. An Aggregated Local Perceptive Feed Forward Layer (ALP-FFN) is introduced to enhance the model's representation capability by introducing locality into the Transformer encoder. Furthermore, this paper constructs a lightweight Transformer block using ALP-FFN and a linear self-attention mechanism to reduce the model's parameters and computational cost. The proposed model achieves an exceptional classification accuracy of 99.71% on the PlantVillage dataset with only 4.9M parameters and 0.62G FLOPs, surpassing the state-of-the-art TNT-S model (accuracy: 99.11%, parameters: 23.31M, FLOPs: 4.85G) by 0.6%. On the Potato Leaf dataset, the model attains 98.78% classification accuracy, outperforming the advanced ResNet-18 model (accuracy: 98.05%, parameters: 11.18M, FLOPs: 1.82G) by 0.73%. The model proposed in this paper effectively combines the advantages of CNN and ViT while maintaining a lightweight design, providing an effective method for the precise identification of crop diseases.
计算机视觉在作物病害分类中具有巨大潜力,但作物病害复杂的纹理和形状特征使病害分类具有挑战性。为解决这些问题,本文提出一种用于作物病害分类的双分支模型,该模型将卷积神经网络(CNN)与视觉Transformer(ViT)相结合。在此,卷积分支用于捕捉局部特征,而Transformer分支用于处理全局特征。使用一个可学习参数来实现这两种特征的线性加权融合。引入聚合局部感知前馈层(ALP-FFN),通过将局部性引入Transformer编码器来增强模型的表示能力。此外,本文使用ALP-FFN和线性自注意力机制构建了一个轻量级的Transformer模块,以减少模型的参数和计算成本。所提出的模型在PlantVillage数据集上仅用490万个参数和0.62G FLOPs就达到了99.71%的卓越分类准确率,比当前最优的TNT-S模型(准确率:99.11%,参数:2331万个,FLOPs:4.85G)高出0.6%。在马铃薯叶片数据集上,该模型达到了98.78%的分类准确率,比先进的ResNet-18模型(准确率:98.05%,参数:1118万个,FLOPs:1.82G)高出0.73%。本文提出的模型有效结合了CNN和ViT的优点,同时保持了轻量级设计,为作物病害的精确识别提供了一种有效方法。