IEEE J Biomed Health Inform. 2024 Nov;28(11):6828-6839. doi: 10.1109/JBHI.2024.3446040. Epub 2024 Nov 6.
In radiology, particularly in lung cancer diagnosis, diagnostic errors and cognitive biases pose substantial challenges. These issues, including perceptual errors, interpretive mistakes, and cognitive biases such as anchoring and premature closure, are often unnoticed by experienced radiologists. To address these challenges, we propose the Multi-Eyes principle approach, which utilises multiple deep learning models to reduce bias and potentially improve diagnostic accuracy. Inspired by the Four-Eyes principle in business and cybersecurity, this methodology employs various 3D and 2D (for validation) deep learning architectures and three uncertainty quantification techniques: Monte Carlo Dropout, Deep Ensemble, and Ensemble Monte Carlo Dropout. Each model functions as an independent reviewer, similar to blind reviews. With entropy selected as the uncertainty measurement, it is averaged, followed by ensemble averaging of predictions. The effectiveness of this approach was demonstrated using the LIDC-IDRI dataset for lung cancer classification. Statistical analysis of the uncertainty's distribution reveals that with more models, uncertainty in incorrect predictions becomes more peaked and left skewed, indicating consensus on uncertainty levels. This results in accuracy and F1 score improvements, even with the best performing model, addressing overconfidence in single-model systems. These findings highlight the potential of the Multi-Eyes principle to significantly improve diagnostic performance in computer-aided diagnostic systems. Future research may explore different uncertainty quantification methods and feedback mechanisms for further advancement.
在放射学领域,特别是在肺癌诊断中,诊断错误和认知偏差带来了巨大的挑战。这些问题包括感知错误、解释错误以及认知偏差,如锚定和过早闭合,这些问题往往被有经验的放射科医生所忽视。为了解决这些挑战,我们提出了多眼原则方法,该方法利用多个深度学习模型来减少偏差并提高诊断准确性。受商业和网络安全中的四眼原则的启发,这种方法采用了各种 3D 和 2D 深度学习架构(用于验证)以及三种不确定性量化技术:蒙特卡罗dropout、深度学习集成和集成蒙特卡罗 dropout。每个模型都充当独立的审阅者,类似于盲审。我们选择熵作为不确定性的度量,对其进行平均,然后对预测进行集成平均。该方法使用 LIDC-IDRI 肺癌分类数据集进行了有效性验证。对不确定性分布的统计分析表明,随着模型数量的增加,错误预测的不确定性变得更加尖锐和左偏,表明在不确定性水平上达成了共识。这导致了准确性和 F1 分数的提高,即使使用性能最佳的模型,也解决了单模型系统中的过度自信问题。这些发现强调了多眼原则在提高计算机辅助诊断系统诊断性能方面的潜力。未来的研究可能会探索不同的不确定性量化方法和反馈机制,以进一步推进。