Zahari Rahimi, Cox Julie, Obara Boguslaw
School of Computing, Newcastle University, Newcastle upon Tyne, UK.
County Durham and Darlington NHS Foundation Trust, County Durham, UK.
Comput Biol Med. 2025 Apr;188:109825. doi: 10.1016/j.compbiomed.2025.109825. Epub 2025 Feb 19.
Uncertainty quantification is crucial in deep learning, especially in medical diagnostics, to measure model prediction confidence and ensure reliable clinical decisions. This study introduces a novel conflict-based uncertainty quantification approach, applied as a case study in lung cancer classification, leveraging Dempster-Shafer Theory in conjunction with Deep Ensemble methods. The proposed method aggregates predictions from multiple neural network models using conflict as an uncertainty measure. By converting softmax outputs into Basic Belief Assignments and applying the rule of combination, this conflict-based method effectively quantifies uncertainty: high conflict values indicate predictions requiring expert review, and low values are considered reliable. Evaluations on the LIDC-IDRI dataset and additional 3D biomedical datasets show that the proposed method achieved high accuracy (0.957) and U (0.819) for lung classification. The sensitivity analysis further revealed that increasing the ensemble size enhanced performance even though the computational demands may challenge real-time applications. In contrast, the entropy-based smoothing effect limits the accuracy improvement of traditional Deep Ensemble methods. In addition, Out-of-Distribution detection with the proposed method achieved AUC scores up to 0.864 across various datasets. Future work will focus on optimising efficiency and exploring alternative Dempster-Shafer Theory combination rules and hybrid models.
不确定性量化在深度学习中至关重要,尤其是在医学诊断中,以衡量模型预测的置信度并确保可靠的临床决策。本研究引入了一种基于冲突的新型不确定性量化方法,并将其作为肺癌分类的案例研究,结合Dempster-Shafer理论与深度集成方法。所提出的方法使用冲突作为不确定性度量来聚合多个神经网络模型的预测。通过将softmax输出转换为基本信度分配并应用组合规则,这种基于冲突的方法有效地量化了不确定性:高冲突值表明预测需要专家审查,而低冲突值则被认为是可靠的。对LIDC-IDRI数据集和其他3D生物医学数据集的评估表明,所提出的方法在肺部分类方面实现了高精度(0.957)和U值(0.819)。敏感性分析进一步表明,尽管计算需求可能对实时应用构成挑战,但增加集成规模可提高性能。相比之下,基于熵的平滑效应限制了传统深度集成方法的精度提升。此外,使用所提出的方法进行分布外检测在各个数据集上的AUC分数高达0.864。未来的工作将集中在优化效率以及探索替代的Dempster-Shafer理论组合规则和混合模型。