Cerekci Esma, Alis Deniz, Denizoglu Nurper, Camurdan Ozden, Ege Seker Mustafa, Ozer Caner, Hansu Muhammed Yusuf, Tanyel Toygar, Oksuz Ilkay, Karaarslan Ercan
Sisli Hamidiye Etfal Training and Research Hospital, Department of Radiology, Istanbul, Turkey.
Acibadem Mehmet Ali Aydinlar University, School of Medicine, Department of Radiology, Istanbul, Turkey.
Eur J Radiol. 2024 Apr;173:111356. doi: 10.1016/j.ejrad.2024.111356. Epub 2024 Feb 5.
Explainable Artificial Intelligence (XAI) is prominent in the diagnostics of opaque deep learning (DL) models, especially in medical imaging. Saliency methods are commonly used, yet there's a lack of quantitative evidence regarding their performance.
To quantitatively evaluate the performance of widely utilized saliency XAI methods in the task of breast cancer detection on mammograms.
Three radiologists drew ground-truth boxes on a balanced mammogram dataset of women (n = 1496 cancer-positive and negative scans) from three centers. A modified, pre-trained DL model was employed for breast cancer detection, using MLO and CC images. Saliency XAI methods, including Gradient-weighted Class Activation Mapping (Grad-CAM), Grad-CAM++, and Eigen-CAM, were evaluated. We utilized the Pointing Game to assess these methods, determining if the maximum value of a saliency map aligned with the bounding boxes, representing the ratio of correctly identified lesions among all cancer patients, with a value ranging from 0 to 1.
The development sample included 2,244 women (75%), with the remaining 748 women (25%) in the testing set for unbiased XAI evaluation. The model's recall, precision, accuracy, and F1-Score in identifying cancer in the testing set were 69%, 88%, 80%, and 0.77, respectively. The Pointing Game Scores for Grad-CAM, Grad-CAM++, and Eigen-CAM were 0.41, 0.30, and 0.35 in women with cancer and marginally increased to 0.41, 0.31, and 0.36 when considering only true-positive samples.
While saliency-based methods provide some degree of explainability, they frequently fall short in delineating how DL models arrive at decisions in a considerable number of instances.
可解释人工智能(XAI)在不透明深度学习(DL)模型的诊断中很突出,尤其是在医学成像领域。显著性方法被广泛使用,但缺乏关于其性能的定量证据。
定量评估广泛使用的显著性XAI方法在乳腺钼靶片乳腺癌检测任务中的性能。
三名放射科医生在来自三个中心的女性平衡乳腺钼靶数据集(n = 1496例癌症阳性和阴性扫描)上绘制真实框。使用改良的预训练DL模型,利用MLO和CC图像进行乳腺癌检测。评估了显著性XAI方法,包括梯度加权类激活映射(Grad-CAM)、Grad-CAM++和特征CAM(Eigen-CAM)。我们使用指向游戏来评估这些方法,确定显著性图的最大值是否与边界框对齐,这代表了所有癌症患者中正确识别病变的比例,值范围为0到1。
开发样本包括2244名女性(75%),其余748名女性(25%)在测试集中用于无偏XAI评估。该模型在测试集中识别癌症的召回率、精确率、准确率和F1分数分别为69%、88%、80%和0.77。Grad-CAM、Grad-CAM++和Eigen-CAM在癌症女性中的指向游戏分数分别为0.41、0.30和0.35,仅考虑真阳性样本时略微增加到0.41、0.31和0.36。
虽然基于显著性的方法提供了一定程度的可解释性,但在相当多的情况下,它们在描述DL模型如何做出决策方面常常不足。