Xu Jiaxin, Zhou Hongliang, Hu Yufan, Xue Yongfei, Zhou Guoxiong, Li Liujun, Dai Weisi, Li Jinyang
College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China.
Department of Soil and Water Systems, University of Idaho, Moscow, ID 83844, USA.
Plants (Basel). 2024 Apr 23;13(9):1176. doi: 10.3390/plants13091176.
Tomato leaf disease control in the field of smart agriculture urgently requires attention and reinforcement. This paper proposes a method called LAFANet for image-text retrieval, which integrates image and text information for joint analysis of multimodal data, helping agricultural practitioners to provide more comprehensive and in-depth diagnostic evidence to ensure the quality and yield of tomatoes. First, we focus on six common tomato leaf disease images and text descriptions, creating a Tomato Leaf Disease Image-Text Retrieval Dataset (TLDITRD), introducing image-text retrieval into the field of tomato leaf disease retrieval. Then, utilizing ViT and BERT models, we extract detailed image features and sequences of textual features, incorporating contextual information from image-text pairs. To address errors in image-text retrieval caused by complex backgrounds, we propose Learnable Fusion Attention (LFA) to amplify the fusion of textual and image features, thereby extracting substantial semantic insights from both modalities. To delve further into the semantic connections across various modalities, we propose a False Negative Elimination-Adversarial Negative Selection (FNE-ANS) approach. This method aims to identify adversarial negative instances that specifically target false negatives within the triplet function, thereby imposing constraints on the model. To bolster the model's capacity for generalization and precision, we propose Adversarial Regularization (AR). This approach involves incorporating adversarial perturbations during model training, thereby fortifying its resilience and adaptability to slight variations in input data. Experimental results show that, compared with existing ultramodern models, LAFANet outperformed existing models on TLDITRD dataset, with top1, top5, and top10 reaching 83.3% and 90.0%, and top1, top5, and top10 reaching 80.3%, 93.7%, and 96.3%. LAFANet offers fresh technical backing and algorithmic insights for the retrieval of tomato leaf disease through image-text correlation.
智能农业领域的番茄叶病防治迫切需要关注和加强。本文提出了一种名为LAFANet的图像-文本检索方法,该方法整合图像和文本信息以对多模态数据进行联合分析,帮助农业从业者提供更全面、深入的诊断依据,以确保番茄的质量和产量。首先,我们聚焦于六种常见的番茄叶病图像和文本描述,创建了一个番茄叶病图像-文本检索数据集(TLDITRD),将图像-文本检索引入番茄叶病检索领域。然后,利用ViT和BERT模型,我们提取详细的图像特征和文本特征序列,纳入来自图像-文本对的上下文信息。为了解决复杂背景导致的图像-文本检索错误,我们提出了可学习融合注意力(LFA)来增强文本和图像特征的融合,从而从两种模态中提取大量语义见解。为了进一步探究跨模态的语义联系,我们提出了假阴性消除-对抗性负选择(FNE-ANS)方法。该方法旨在识别在三元组函数中专门针对假阴性的对抗性负实例,从而对模型施加约束。为了增强模型的泛化能力和精度,我们提出了对抗性正则化(AR)。这种方法包括在模型训练期间纳入对抗性扰动,从而增强其对输入数据轻微变化的弹性和适应性。实验结果表明,与现有的超现代模型相比,LAFANet在TLDITRD数据集上优于现有模型,top1、top5和top10分别达到83.3%、90.0%,以及80.3%、93.7%和96.3%。LAFANet为通过图像-文本关联检索番茄叶病提供了新的技术支持和算法见解。