Pham Tri-Cong, Nguyen Tien-Nam, Nguyen Van-Duy
Thuyloi University, 175 Tay Son, Dong Da, Hanoi, 10000, Vietnam.
L3i Laboratory, University of La Rochelle, 17000, La Rochelle, France.
Sci Rep. 2025 Apr 23;15(1):14070. doi: 10.1038/s41598-025-95849-3.
In deep learning, Semi-Supervised Learning is a highly effective technique to enhances neural network training by leveraging both labeled and unlabeled data. This process involves using a trained model to generate pseudo labels to the unlabeled samples, which are then incorporated to further train the original model, resulting in a new model. However, if these pseudo labels contain substantial errors, the resulting model's accuracy may drop, potentially falling below the performance of the initial model. To tackle the problem, we propose an Ambiguity-Aware Semi-Supervised Learning method for Leaf Disease Classification. Specifically, we present a per-disease ambiguity rejection algorithm that eliminates ambiguous results, thereby enhancing the precision of pseudo labels for the subsequent semi-supervised training step and improving the precision of the final classifier. The proposed method is evaluated on two public leaf disease datasets of coffee and banana across various data scenarios, including supervised and semi-supervised settings, with varying proportions of labeled data. The results indicate that our semi-supervised method reduces the reliance for fully labeled datasets while preserving high accuracy by utilizing the ambiguity rejection algorithm. Additionally, the rejection algorithm significantly boosts precision of final classifier on both coffee and banana datasets, achieving rates of 99.46% and 100.0%, respectively, while using only 50% labeled data. The study also presents a thorough set of experiments and analyses to validate the effectiveness of the proposed method, comparing its performance against state-of-the-art supervised approaches. The results demonstrate that our method, despite using only 50% of the labeled data, achieves competitive performance compared to fully supervised models that use 100% of the labeled data.
在深度学习中,半监督学习是一种高效技术,可通过利用标记数据和未标记数据来增强神经网络训练。此过程涉及使用经过训练的模型为未标记样本生成伪标签,然后将这些伪标签纳入以进一步训练原始模型,从而得到一个新模型。然而,如果这些伪标签包含大量错误,那么最终模型的准确率可能会下降,甚至可能低于初始模型的性能。为解决这一问题,我们提出了一种用于叶片疾病分类的模糊感知半监督学习方法。具体而言,我们提出了一种针对每种疾病的模糊性拒绝算法,该算法可消除模糊结果,从而提高后续半监督训练步骤中伪标签的精度,并提高最终分类器的精度。我们在咖啡和香蕉的两个公共叶片疾病数据集上,针对各种数据场景(包括监督和半监督设置,标记数据比例不同)对所提出的方法进行了评估。结果表明,我们的半监督方法通过利用模糊性拒绝算法,在保持高精度的同时减少了对完全标记数据集的依赖。此外,该拒绝算法在咖啡和香蕉数据集上均显著提高了最终分类器的精度,在仅使用50%标记数据时,分别达到了99.46%和100.0%的准确率。该研究还进行了一系列全面的实验和分析,以验证所提方法的有效性,并将其性能与最先进的监督方法进行比较。结果表明,我们的方法尽管仅使用了50%的标记数据,但与使用100%标记数据的完全监督模型相比,仍具有竞争力。