Ariyametkul Awika, Paing May Phu
Department of Biomedical Engineering, School of Engineering, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand.
Quant Imaging Med Surg. 2025 Jul 1;15(7):6252-6271. doi: 10.21037/qims-2024-2911. Epub 2025 Jun 30.
Breast cancer is the most frequently diagnosed and leading cause of cancer-related mortality among women worldwide. The danger of this disease is due to its asymptomatic nature in the early stages, thereby underscoring the importance of early detection. Mammography, a specialized X-ray imaging technique for breast examination, has been pivotal in facilitating early detection and reducing mortality rates. In recent years, artificial intelligence (AI) has gained substantial popularity across various fields, including medicine. Numerous studies have leveraged AI techniques, particularly convolutional neural networks (CNNs) and You Only Look Once (YOLO)-based models, for medical image detection and classification. However, the predictions of such AI models often lack transparency and explainability, resulting in low trustworthiness. This study aims to address this gap by investigating three state-of-the-art versions of the YOLO algorithm-YOLO version 9 (YOLOv9), YOLO version 10 (YOLOv10), and YOLO version 11 (YOLO11)-trained on breast cancer imaging datasets, specifically the INbreast and Mammographic Image Analysis Society (MIAS) databases. Additionally, to address the challenges posed by the lack of explainability and transparency, we integrate seven explainable artificial intelligence (XAI) methods: Grad-CAM, Grad-CAM++, Eigen-CAM, EigenGrad-CAM, XGrad-CAM, LayerCAM, and HiResCAM.
This study utilized two publicly available breast cancer image databases: INbreast: toward a Full-field Digital Mammographic Database and the MIAS dataset. Preprocessing steps were applied to standardize all images in accordance with the input requirements of the YOLO architecture, as these datasets were used to train the three most recent versions of YOLO. The YOLO model demonstrating the highest performance-measured by mean average precision (mAP), precision, and recall-was selected for integration with seven different XAI methods. The performance of each XAI technique was evaluated both qualitatively through visual inspection and quantitatively using several metrics, including matching ground truth (mGT), Pearson correlation coefficient (PCC), precision, recall, and root mean square error (RMSE). These methodologies were employed to interpret and visualize the "black box" decision-making processes of the top-performing YOLO model.
Based on our experimental findings, YOLO11 outperformed YOLOv9 (mAP 0.868) and YOLOv10 (mAP 0.926), achieving the highest mAP of 0.935, with classification accuracies of 95% for benign and 80% for malignant cases. Among the evaluated XAI techniques, HiResCAM provided the most effective visual explanations, attaining the highest mGT score of 0.49, surpassing EigenGrad-CAM (0.45) and LayerCAM (0.42) in both visual and quantitative evaluations.
The integration of YOLO11 with HiResCAM offers a robust solution that combines high detection accuracy with improved model interpretability. This approach not only enhances user trustworthiness by revealing decision-making patterns and limitations but also provide insights into the weaknesses of the model, enabling developers to refine and improve AI performance further.
乳腺癌是全球女性中最常被诊断出的癌症,也是癌症相关死亡的主要原因。这种疾病的危险性在于其早期无症状的特性,这凸显了早期检测的重要性。乳腺钼靶摄影是一种用于乳腺检查的专门X射线成像技术,在促进早期检测和降低死亡率方面发挥了关键作用。近年来,人工智能(AI)在包括医学在内的各个领域都获得了广泛的应用。许多研究利用AI技术,特别是卷积神经网络(CNN)和基于You Only Look Once(YOLO)的模型,进行医学图像检测和分类。然而,此类AI模型的预测往往缺乏透明度和可解释性,导致可信度较低。本研究旨在通过调查在乳腺癌成像数据集(特别是INbreast和乳腺X线图像分析协会(MIAS)数据库)上训练的三种最新版本的YOLO算法——YOLO版本9(YOLOv9)、YOLO版本10(YOLOv10)和YOLO版本11(YOLO11)来填补这一空白。此外,为了应对缺乏可解释性和透明度带来的挑战,我们整合了七种可解释人工智能(XAI)方法:Grad-CAM、Grad-CAM++、Eigen-CAM、EigenGrad-CAM、XGrad-CAM、LayerCAM和HiResCAM。
本研究使用了两个公开可用的乳腺癌图像数据库:INbreast:迈向全场数字化乳腺钼靶数据库和MIAS数据集。由于这些数据集用于训练YOLO的三个最新版本,因此应用了预处理步骤,以根据YOLO架构的输入要求对所有图像进行标准化。选择表现最佳的YOLO模型(以平均精度均值(mAP)、精度和召回率衡量)与七种不同的XAI方法进行整合。通过视觉检查对每种XAI技术的性能进行定性评估,并使用包括匹配地面真值(mGT)、皮尔逊相关系数(PCC)、精度、召回率和均方根误差(RMSE)在内的多个指标进行定量评估。这些方法用于解释和可视化表现最佳的YOLO模型的“黑箱”决策过程。
根据我们的实验结果,YOLO11的表现优于YOLOv9(mAP为0.868)和YOLOv10(mAP为0.926),实现了最高的mAP为0.935,良性病例的分类准确率为95%,恶性病例的分类准确率为80%。在评估的XAI技术中,HiResCAM提供了最有效的视觉解释,在视觉和定量评估中均获得了最高的mGT分数0.49,超过了EigenGrad-CAM(0.45)和LayerCAM(0.42)。
将YOLO11与HiResCAM整合提供了一个强大的解决方案,将高检测精度与改进的模型可解释性相结合。这种方法不仅通过揭示决策模式和局限性增强了用户的可信度,还提供了对模型弱点的见解,使开发人员能够进一步优化和改进AI性能。