Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany.
Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany; Institute of Computational Biology, Helmholtz Center Munich, Munich, Germany; Institute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Center Munich, Munich, Germany.
Med Image Anal. 2022 Apr;77:102364. doi: 10.1016/j.media.2022.102364. Epub 2022 Jan 22.
Deep neural networks (DNNs) have achieved physician-level accuracy on many imaging-based medical diagnostic tasks, for example classification of retinal images in ophthalmology. However, their decision mechanisms are often considered impenetrable leading to a lack of trust by clinicians and patients. To alleviate this issue, a range of explanation methods have been proposed to expose the inner workings of DNNs leading to their decisions. For imaging-based tasks, this is often achieved via saliency maps. The quality of these maps are typically evaluated via perturbation analysis without experts involved. To facilitate the adoption and success of such automated systems, however, it is crucial to validate saliency maps against clinicians. In this study, we used three different network architectures and developed ensembles of DNNs to detect diabetic retinopathy and neovascular age-related macular degeneration from retinal fundus images and optical coherence tomography scans, respectively. We used a variety of explanation methods and obtained a comprehensive set of saliency maps for explaining the ensemble-based diagnostic decisions. Then, we systematically validated saliency maps against clinicians through two main analyses - a direct comparison of saliency maps with the expert annotations of disease-specific pathologies and perturbation analyses using also expert annotations as saliency maps. We found the choice of DNN architecture and explanation method to significantly influence the quality of saliency maps. Guided Backprop showed consistently good performance across disease scenarios and DNN architectures, suggesting that it provides a suitable starting point for explaining the decisions of DNNs on retinal images.
深度神经网络 (DNN) 在许多基于成像的医学诊断任务上已经达到了医生级别的准确性,例如眼科中的视网膜图像分类。然而,它们的决策机制通常被认为是不可理解的,导致临床医生和患者缺乏信任。为了解决这个问题,已经提出了一系列解释方法来揭示 DNN 决策的内部工作原理。对于基于成像的任务,这通常通过显着性映射来实现。这些地图的质量通常通过没有专家参与的扰动分析来评估。然而,为了促进此类自动化系统的采用和成功,至关重要的是要根据临床医生对显着性映射进行验证。在这项研究中,我们使用了三种不同的网络架构,并开发了 DNN 集合来分别从视网膜眼底图像和光学相干断层扫描中检测糖尿病性视网膜病变和新生血管性年龄相关性黄斑变性。我们使用了各种解释方法,并获得了一套全面的显着性映射,用于解释基于集合的诊断决策。然后,我们通过两种主要分析系统地将显着性映射与临床医生进行验证——一种是将显着性映射与特定疾病病理学的专家注释进行直接比较,另一种是使用专家注释作为显着性映射进行扰动分析。我们发现 DNN 架构和解释方法的选择会显着影响显着性映射的质量。引导反向传播在疾病情况和 DNN 架构方面表现出一致的良好性能,这表明它为解释 DNN 对视网膜图像的决策提供了一个合适的起点。