• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

重新审视放射科 AI 中显著性方法的可信度。

Revisiting the Trustworthiness of Saliency Methods in Radiology AI.

机构信息

From the Department of Biomedical Engineering, Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, 110 8th St, Biotech 4231, Troy, NY 12180 (J.Z., H.C., G.W., P.Y.); and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (G.D., M.K.K.).

出版信息

Radiol Artif Intell. 2024 Jan;6(1):e220221. doi: 10.1148/ryai.220221.

DOI:10.1148/ryai.220221
PMID:38166328
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10831523/
Abstract

Purpose To determine whether saliency maps in radiology artificial intelligence (AI) are vulnerable to subtle perturbations of the input, which could lead to misleading interpretations, using prediction-saliency correlation (PSC) for evaluating the sensitivity and robustness of saliency methods. Materials and Methods In this retrospective study, locally trained deep learning models and a research prototype provided by a commercial vendor were systematically evaluated on 191 229 chest radiographs from the CheXpert dataset and 7022 MR images from a human brain tumor classification dataset. Two radiologists performed a reader study on 270 chest radiograph pairs. A model-agnostic approach for computing the PSC coefficient was used to evaluate the sensitivity and robustness of seven commonly used saliency methods. Results The saliency methods had low sensitivity (maximum PSC, 0.25; 95% CI: 0.12, 0.38) and weak robustness (maximum PSC, 0.12; 95% CI: 0.0, 0.25) on the CheXpert dataset, as demonstrated by leveraging locally trained model parameters. Further evaluation showed that the saliency maps generated from a commercial prototype could be irrelevant to the model output, without knowledge of the model specifics (area under the receiver operating characteristic curve decreased by 8.6% without affecting the saliency map). The human observer studies confirmed that it is difficult for experts to identify the perturbed images; the experts had less than 44.8% correctness. Conclusion Popular saliency methods scored low PSC values on the two datasets of perturbed chest radiographs, indicating weak sensitivity and robustness. The proposed PSC metric provides a valuable quantification tool for validating the trustworthiness of medical AI explainability. Saliency Maps, AI Trustworthiness, Dynamic Consistency, Sensitivity, Robustness © RSNA, 2023 See also the commentary by Yanagawa and Sato in this issue.

摘要

目的 利用预测显著相关(PSC)评估显著图方法的灵敏度和稳健性,以确定放射学人工智能(AI)中的显著图是否容易受到输入的细微干扰,从而导致误导性解释。

材料与方法 本回顾性研究对来自 CheXpert 数据集的 191 229 张胸部 X 线片和人类脑肿瘤分类数据集的 7022 张磁共振图像,分别使用本地训练的深度学习模型和商业供应商提供的研究原型进行了系统评估。两位放射科医生对 270 对胸部 X 线片进行了读片研究。使用一种模型不可知的方法计算 PSC 系数,以评估七种常用显著图方法的灵敏度和稳健性。

结果 在 CheXpert 数据集上,利用本地训练的模型参数,显著图方法的灵敏度较低(最大 PSC 为 0.25;95%CI:0.12,0.38),稳健性较弱(最大 PSC 为 0.12;95%CI:0.0,0.25)。进一步的评估表明,无需了解模型的具体细节,商业原型生成的显著图可能与模型输出无关(不影响显著图时,受试者工作特征曲线下面积降低 8.6%)。人体观察者研究证实,专家很难识别受到干扰的图像;专家的正确率不到 44.8%。

结论 在两个胸部 X 射线片受扰数据集上,流行的显著图方法的 PSC 值较低,表明灵敏度和稳健性较弱。所提出的 PSC 指标为验证医学 AI 可解释性的可信度提供了有价值的量化工具。

显著图,人工智能可信度,动态一致性,灵敏度,稳健性 ©RSNA,2023 也可参见本期 Yanagawa 和 Sato 的评论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6938/10831523/cefe654a0187/ryai.220221.VA.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6938/10831523/cefe654a0187/ryai.220221.VA.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6938/10831523/cefe654a0187/ryai.220221.VA.jpg

相似文献

1
Revisiting the Trustworthiness of Saliency Methods in Radiology AI.重新审视放射科 AI 中显著性方法的可信度。
Radiol Artif Intell. 2024 Jan;6(1):e220221. doi: 10.1148/ryai.220221.
2
Attention-based Saliency Maps Improve Interpretability of Pneumothorax Classification.基于注意力的显著图提高气胸分类的可解释性。
Radiol Artif Intell. 2022 Mar 1;5(2):e220187. doi: 10.1148/ryai.220187. eCollection 2023 Mar.
3
Gradient-Based Saliency Maps Are Not Trustworthy Visual Explanations of Automated AI Musculoskeletal Diagnoses.基于梯度的显著图不是自动化 AI 肌肉骨骼诊断的可靠视觉解释。
J Imaging Inform Med. 2024 Oct;37(5):2490-2499. doi: 10.1007/s10278-024-01136-4. Epub 2024 May 6.
4
Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation.用于生成胸部X光报告的特定领域多模态生成式人工智能模型的诊断准确性和临床价值
Radiology. 2025 Mar;314(3):e241476. doi: 10.1148/radiol.241476.
5
Commercially Available Chest Radiograph AI Tools for Detecting Airspace Disease, Pneumothorax, and Pleural Effusion.用于检测气腔疾病、气胸和胸腔积液的商用胸部X光人工智能工具。
Radiology. 2023 Sep;308(3):e231236. doi: 10.1148/radiol.231236.
6
Using AI to Identify Unremarkable Chest Radiographs for Automatic Reporting.利用人工智能识别无明显特征的胸部 X 光片进行自动报告。
Radiology. 2024 Aug;312(2):e240272. doi: 10.1148/radiol.240272.
7
Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents.人工智能算法与放射科住院医师对胸部 X 线片解读的比较。
JAMA Netw Open. 2020 Oct 1;3(10):e2022779. doi: 10.1001/jamanetworkopen.2020.22779.
8
Effect of Artificial Intelligence as a Second Reader on the Lung Nodule Detection and Localization Accuracy of Radiologists and Non-radiology Physicians in Chest Radiographs: A Multicenter Reader Study.人工智能作为第二阅片者对胸部X线片中放射科医生和非放射科医生肺结节检测及定位准确性的影响:一项多中心阅片者研究
Acad Radiol. 2025 Mar;32(3):1706-1717. doi: 10.1016/j.acra.2024.11.003. Epub 2024 Nov 25.
9
Value of Using a Generative AI Model in Chest Radiography Reporting: A Reader Study.在胸部X光报告中使用生成式人工智能模型的价值:一项读者研究。
Radiology. 2025 Mar;314(3):e241646. doi: 10.1148/radiol.241646.
10
Artificial intelligence-based detection of atrial fibrillation from chest radiographs.基于人工智能的胸片心房颤动检测。
Eur Radiol. 2022 Sep;32(9):5890-5897. doi: 10.1007/s00330-022-08752-0. Epub 2022 Mar 31.

引用本文的文献

1
A multi-modal graph-based framework for Alzheimer's disease detection.一种基于多模态图谱的阿尔茨海默病检测框架。
Sci Rep. 2025 Jul 2;15(1):22684. doi: 10.1038/s41598-025-05966-2.
2
A clinically accessible small multimodal radiology model and evaluation metric for chest X-ray findings.一种用于胸部X光检查结果的临床可用的小型多模态放射学模型及评估指标。
Nat Commun. 2025 Apr 1;16(1):3108. doi: 10.1038/s41467-025-58344-x.
3
Medical multimodal multitask foundation model for lung cancer screening.用于肺癌筛查的医学多模态多任务基础模型。

本文引用的文献

1
Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging.评估用于医学影像中异常定位的显著性图的可信度。
Radiol Artif Intell. 2021 Oct 6;3(6):e200267. doi: 10.1148/ryai.2021200267. eCollection 2021 Nov.
2
Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams.人工智能系统减少了乳腺超声检查中假阳性结果的出现。
Nat Commun. 2021 Sep 24;12(1):5645. doi: 10.1038/s41467-021-26023-2.
3
Explainable Deep Learning Models in Medical Image Analysis.
Nat Commun. 2025 Feb 11;16(1):1523. doi: 10.1038/s41467-025-56822-w.
4
Achieving More with Less: Combining Strong and Weak Labels for Intracranial Hemorrhage Detection.以更少资源实现更多成果:结合强标签与弱标签用于颅内出血检测
Radiol Artif Intell. 2024 Nov;6(6):e240670. doi: 10.1148/ryai.240670.
5
Simulating clinical features on chest radiographs for medical image exploration and CNN explainability using a style-based generative adversarial autoencoder.使用基于风格的生成对抗自动编码器模拟胸部 X 光片的临床特征,用于医学图像探索和 CNN 可解释性。
Sci Rep. 2024 Oct 18;14(1):24427. doi: 10.1038/s41598-024-75886-0.
6
Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects.医学成像人工智能中的偏差:基础、检测、避免、缓解、挑战、伦理及前景
Diagn Interv Radiol. 2025 Mar 3;31(2):75-88. doi: 10.4274/dir.2024.242854. Epub 2024 Jul 2.
7
Unbiasing Fairness Evaluation of Radiology AI Model.放射学人工智能模型的无偏公平性评估
Meta Radiol. 2024 Sep;2(3). doi: 10.1016/j.metrad.2024.100084. Epub 2024 Jun 13.
8
Gradient-Based Saliency Maps Are Not Trustworthy Visual Explanations of Automated AI Musculoskeletal Diagnoses.基于梯度的显著图不是自动化 AI 肌肉骨骼诊断的可靠视觉解释。
J Imaging Inform Med. 2024 Oct;37(5):2490-2499. doi: 10.1007/s10278-024-01136-4. Epub 2024 May 6.
9
Weak Supervision, Strong Results: Achieving High Performance in Intracranial Hemorrhage Detection with Fewer Annotation Labels.弱监督,强结果:使用更少的标注标签实现颅内出血检测的高性能
Radiol Artif Intell. 2024 Jan;6(1):e230598. doi: 10.1148/ryai.230598.
10
Making "CASES" for AI in Medicine.为人工智能在医学领域创造“案例”。
BME Front. 2024 Jan 29;5:0036. doi: 10.34133/bmef.0036. eCollection 2024.
医学图像分析中的可解释深度学习模型
J Imaging. 2020 Jun 20;6(6):52. doi: 10.3390/jimaging6060052.
4
Adversarial attack vulnerability of medical image analysis systems: Unexplored factors.对抗攻击对医学影像分析系统的漏洞:未知因素。
Med Image Anal. 2021 Oct;73:102141. doi: 10.1016/j.media.2021.102141. Epub 2021 Jun 18.
5
An ensemble of neural networks provides expert-level prenatal detection of complex congenital heart disease.神经网络集成提供了专家级别的复杂先天性心脏病产前检测。
Nat Med. 2021 May;27(5):882-891. doi: 10.1038/s41591-021-01342-5. Epub 2021 May 14.
6
Towards evaluating the robustness of deep diagnostic models by adversarial attack.通过对抗攻击评估深度诊断模型的稳健性。
Med Image Anal. 2021 Apr;69:101977. doi: 10.1016/j.media.2021.101977. Epub 2021 Jan 22.
7
Iterative Augmentation of Visual Evidence for Weakly-Supervised Lesion Localization in Deep Interpretability Frameworks: Application to Color Fundus Images.基于深度可解释框架的弱监督病灶定位中视觉证据的迭代增强:在眼底彩色图像中的应用。
IEEE Trans Med Imaging. 2020 Nov;39(11):3499-3511. doi: 10.1109/TMI.2020.2994463. Epub 2020 Oct 28.
8
Detection of anaemia from retinal fundus images via deep learning.利用深度学习从眼底图像中检测贫血
Nat Biomed Eng. 2020 Jan;4(1):18-27. doi: 10.1038/s41551-019-0487-z. Epub 2019 Dec 23.
9
Adversarial attacks on medical machine learning.对医学机器学习的对抗攻击。
Science. 2019 Mar 22;363(6433):1287-1289. doi: 10.1126/science.aaw4399.
10
Using a Deep Learning Algorithm and Integrated Gradients Explanation to Assist Grading for Diabetic Retinopathy.利用深度学习算法和集成梯度解释辅助糖尿病视网膜病变分级。
Ophthalmology. 2019 Apr;126(4):552-564. doi: 10.1016/j.ophtha.2018.11.016. Epub 2018 Dec 13.