Kondaveeti Hari Kishan, Simhadri Chinna Gopi
School of Computer Science and Engineering, VIT-AP University, Amaravathi, 522237, Andhra Pradesh, India.
Department of Computer Science and Engineering, Vignan's Foundation for Science, Technology, and Research, Guntur, 522213, Andhra Pradesh, India.
Sci Rep. 2025 Aug 29;15(1):31850. doi: 10.1038/s41598-025-14306-3.
Deep learning models have shown remarkable success in disease detection and classification tasks, but lack transparency in their decision-making process, creating reliability and trust issues. Although traditional evaluation methods focus entirely on performance metrics such as classification accuracy, precision and recall, they fail to assess whether the models are considering relevant features for decision-making. The main objective of this work is to develop and validate a comprehensive three-stage methodology that combines conventional performance evaluation with qualitative and quantitative evaluation of explainable artificial intelligence (XAI) visualizations to assess both the accuracy and reliability of deep learning models. Eight pre-trained deep learning models - ResNet50, InceptionResNetV2, DenseNet 201, InceptionV3, EfficientNetB0, Xception, VGG16 and AlexNet,were evaluated using a three-stage methodology. First, the models are assessed using traditional classification metrics. Second, Local Interpretable Model-agnostic Explanations (LIME) is employed to visualize and quantitatively evaluate feature selection using metrics such as Intersection over Union (IoU) and the Dice Similarity Coefficient (DSC). Third, a novel overfitting ratio metric is introduced to quantify the reliance of the models on insignificant features. In the experimental analysis, ResNet50 emerged as the most accurate model, achieving 99.13% classification accuracy as well as the most reliable model demonstrating superior feature selection capabilities (IoU: 0.432, overfitting ratio: 0.284). Despite the high classification accuracies, models such as InceptionV3 and EfficientNetB0 showed poor feature selection capabilities with low IoU scores (0.295 and 0.326) and high overfitting ratios (0.544 and 0.458), indicating potential reliability issues in real-world applications. This study introduces a novel quantitative methodology for evaluating deep learning models that goes beyond traditional accuracy metrics, enabling more reliable and trustworthy AI systems for agricultural applications. This methodology is generic and researchers can explore the possibilities of extending it to other domains that require transparent and interpretable AI systems.
深度学习模型在疾病检测和分类任务中取得了显著成功,但在决策过程中缺乏透明度,引发了可靠性和信任问题。尽管传统评估方法完全侧重于分类准确率、精确率和召回率等性能指标,但它们未能评估模型在决策时是否考虑了相关特征。这项工作的主要目标是开发并验证一种全面的三阶段方法,该方法将传统性能评估与可解释人工智能(XAI)可视化的定性和定量评估相结合,以评估深度学习模型的准确性和可靠性。使用三阶段方法对八个预训练的深度学习模型——ResNet50、InceptionResNetV2、DenseNet 201、InceptionV3、EfficientNetB0、Xception、VGG16和AlexNet进行了评估。首先,使用传统分类指标对模型进行评估。其次,采用局部可解释模型无关解释(LIME)来可视化并使用交并比(IoU)和骰子相似系数(DSC)等指标对特征选择进行定量评估。第三,引入一种新颖的过拟合率指标来量化模型对无关紧要特征的依赖程度。在实验分析中,ResNet50成为最准确的模型,分类准确率达到99.13%,同时也是最可靠的模型,展现出卓越的特征选择能力(IoU:0.432,过拟合率:0.284)。尽管分类准确率很高,但InceptionV3和EfficientNetB0等模型的特征选择能力较差,IoU分数较低(分别为0.295和0.326),过拟合率较高(分别为0.544和0.458),这表明在实际应用中可能存在可靠性问题。本研究引入了一种超越传统准确率指标的新颖定量方法来评估深度学习模型,为农业应用打造更可靠、更值得信赖的人工智能系统。这种方法具有通用性,研究人员可以探索将其扩展到其他需要透明且可解释人工智能系统的领域的可能性。
Cochrane Database Syst Rev. 2022-5-20
Ultrasound Med Biol. 2025-6-28
Front Plant Sci. 2025-1-8
Front Plant Sci. 2024-6-26
Front Plant Sci. 2023-9-19
Life (Basel). 2023-5-29
PeerJ Comput Sci. 2023-6-13
Environ Monit Assess. 2022-11-18
Front Plant Sci. 2022-10-17