School of Information Systems, Queensland University of Technology, Australia.
IBM Research AI, Bangalore, India.
Comput Biol Med. 2022 Jul;146:105587. doi: 10.1016/j.compbiomed.2022.105587. Epub 2022 May 8.
Recent years have seen deep neural networks (DNN) gain widespread acceptance for a range of computer vision tasks that include medical imaging. Motivated by their performance, multiple studies have focused on designing deep convolutional neural network architectures tailored to detect COVID-19 cases from chest computerized tomography (CT) images. However, a fundamental challenge of DNN models is their inability to explain the reasoning for a diagnosis. Explainability is essential for medical diagnosis, where understanding the reason for a decision is as important as the decision itself. A variety of algorithms have been proposed that generate explanations and strive to enhance users' trust in DNN models. Yet, the influence of the generated machine learning explanations on clinicians' trust for complex decision tasks in healthcare has not been understood. This study evaluates the quality of explanations generated for a deep learning model that detects COVID-19 based on CT images and examines the influence of the quality of these explanations on clinicians' trust. First, we collect radiologist-annotated explanations of the CT images for the diagnosis of COVID-19 to create the ground truth. We then compare ground truth explanations with machine learning explanations. Our evaluation shows that the explanations produced. by different algorithms were often correct (high precision) when compared to the radiologist annotated ground truth but a significant number of explanations were missed (significantly lower recall). We further conduct a controlled experiment to study the influence of machine learning explanations on clinicians' trust for the diagnosis of COVID-19. Our findings show that while the clinicians' trust in automated diagnosis increases with the explanations, their reliance on the diagnosis reduces as clinicians are less likely to rely on algorithms that are not close to human judgement. Clinicians want higher recall of the explanations for a better understanding of an automated diagnosis system.
近年来,深度学习网络(DNN)在包括医学成像在内的一系列计算机视觉任务中得到了广泛的认可。受其性能的启发,多项研究集中在设计专门用于从胸部计算机断层扫描(CT)图像中检测 COVID-19 病例的深度卷积神经网络架构上。然而,DNN 模型的一个基本挑战是它们无法解释诊断的推理。可解释性对于医学诊断至关重要,在医学诊断中,理解决策的原因与决策本身同样重要。已经提出了各种算法来生成解释并努力增强用户对 DNN 模型的信任。然而,生成的机器学习解释对临床医生在医疗保健中复杂决策任务的信任的影响尚不清楚。本研究评估了基于 CT 图像检测 COVID-19 的深度学习模型生成的解释的质量,并研究了这些解释的质量对临床医生信任的影响。首先,我们收集放射科医生对 COVID-19 诊断的 CT 图像的注释解释,以创建地面真实。然后,我们将地面真实解释与机器学习解释进行比较。我们的评估表明,与放射科医生注释的地面真实相比,不同算法生成的解释通常是正确的(高精度),但有大量解释被遗漏(召回率明显较低)。我们进一步进行了一项对照实验,以研究机器学习解释对临床医生对 COVID-19 诊断的信任的影响。我们的研究结果表明,尽管临床医生对自动化诊断的信任随着解释的增加而增加,但随着临床医生不太可能依赖与人类判断不接近的算法,他们对诊断的依赖程度降低。临床医生希望更好地理解自动化诊断系统,解释的召回率更高。