• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能辅助自动化乳腺 X 线摄影判读的集成模型的外部验证。

External Validation of an Ensemble Model for Automated Mammography Interpretation by Artificial Intelligence.

机构信息

Medical and Imaging Informatics, Department of Radiological Sciences, David Geffen School of Medicine at University California, Los Angeles.

Clinical Research Division, Fred Hutchinson Cancer Center, Seattle, Washington.

出版信息

JAMA Netw Open. 2022 Nov 1;5(11):e2242343. doi: 10.1001/jamanetworkopen.2022.42343.

DOI:10.1001/jamanetworkopen.2022.42343
PMID:36409497
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9679879/
Abstract

IMPORTANCE

With a shortfall in fellowship-trained breast radiologists, mammography screening programs are looking toward artificial intelligence (AI) to increase efficiency and diagnostic accuracy. External validation studies provide an initial assessment of how promising AI algorithms perform in different practice settings.

OBJECTIVE

To externally validate an ensemble deep-learning model using data from a high-volume, distributed screening program of an academic health system with a diverse patient population.

DESIGN, SETTING, AND PARTICIPANTS: In this diagnostic study, an ensemble learning method, which reweights outputs of the 11 highest-performing individual AI models from the Digital Mammography Dialogue on Reverse Engineering Assessment and Methods (DREAM) Mammography Challenge, was used to predict the cancer status of an individual using a standard set of screening mammography images. This study was conducted using retrospective patient data collected between 2010 and 2020 from women aged 40 years and older who underwent a routine breast screening examination and participated in the Athena Breast Health Network at the University of California, Los Angeles (UCLA).

MAIN OUTCOMES AND MEASURES

Performance of the challenge ensemble method (CEM) and the CEM combined with radiologist assessment (CEM+R) were compared with diagnosed ductal carcinoma in situ and invasive cancers within a year of the screening examination using performance metrics, such as sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC).

RESULTS

Evaluated on 37 317 examinations from 26 817 women (mean [SD] age, 58.4 [11.5] years), individual model AUROC estimates ranged from 0.77 (95% CI, 0.75-0.79) to 0.83 (95% CI, 0.81-0.85). The CEM model achieved an AUROC of 0.85 (95% CI, 0.84-0.87) in the UCLA cohort, lower than the performance achieved in the Kaiser Permanente Washington (AUROC, 0.90) and Karolinska Institute (AUROC, 0.92) cohorts. The CEM+R model achieved a sensitivity (0.813 [95% CI, 0.781-0.843] vs 0.826 [95% CI, 0.795-0.856]; P = .20) and specificity (0.925 [95% CI, 0.916-0.934] vs 0.930 [95% CI, 0.929-0.932]; P = .18) similar to the radiologist performance. The CEM+R model had significantly lower sensitivity (0.596 [95% CI, 0.466-0.717] vs 0.850 [95% CI, 0.766-0.923]; P < .001) and specificity (0.803 [95% CI, 0.734-0.861] vs 0.945 [95% CI, 0.936-0.954]; P < .001) than the radiologist in women with a prior history of breast cancer and Hispanic women (0.894 [95% CI, 0.873-0.910] vs 0.926 [95% CI, 0.919-0.933]; P = .004).

CONCLUSIONS AND RELEVANCE

This study found that the high performance of an ensemble deep-learning model for automated screening mammography interpretation did not generalize to a more diverse screening cohort, suggesting that the model experienced underspecification. This study suggests the need for model transparency and fine-tuning of AI models for specific target populations prior to their clinical adoption.

摘要

重要性:随着接受过 fellowship培训的乳腺放射科医生的短缺,乳房 X 光筛查项目正在寻求人工智能(AI)来提高效率和诊断准确性。外部验证研究初步评估了有前途的 AI 算法在不同实践环境中的表现。

目的:使用来自学术健康系统的高容量分布式筛查计划的患者人群的多样性数据,对使用深度学习模型的集成进行外部验证。

设计、设置和参与者:在这项诊断研究中,使用了一种集成学习方法,该方法重新加权了 Digital Mammography Dialogue on Reverse Engineering Assessment and Methods(DREAM)乳房 X 光挑战赛中 11 个表现最佳的 AI 模型的输出,以使用标准的筛查乳房 X 光图像来预测个体的癌症状态。这项研究使用了回顾性患者数据,这些数据是在 2010 年至 2020 年期间从 40 岁及以上接受常规乳房筛查检查并参加加州大学洛杉矶分校(UCLA)雅典娜乳房健康网络的女性中收集的。

主要结果和措施:使用性能指标(如敏感性、特异性和接收者操作特征曲线下的面积(AUROC))比较了挑战集成方法(CEM)和 CEM 与放射科医生评估相结合(CEM+R)的性能,与一年内筛查检查的导管原位癌和浸润性癌的诊断结果进行比较。

结果:在 26817 名女性的 37317 次检查中评估(平均[SD]年龄,58.4[11.5]岁),个体模型 AUROC 估计值范围为 0.77(95%CI,0.75-0.79)至 0.83(95%CI,0.81-0.85)。在 UCLA 队列中,CEM 模型的 AUROC 为 0.85(95%CI,0.84-0.87),低于 Kaiser Permanente Washington(AUROC,0.90)和 Karolinska Institute(AUROC,0.92)队列的性能。CEM+R 模型的敏感性(0.813[95%CI,0.781-0.843]vs 0.826[95%CI,0.795-0.856];P=0.20)和特异性(0.925[95%CI,0.916-0.934]vs 0.930[95%CI,0.929-0.932];P=0.18)与放射科医生的表现相似。CEM+R 模型的敏感性(0.596[95%CI,0.466-0.717]vs 0.850[95%CI,0.766-0.923];P<0.001)和特异性(0.803[95%CI,0.734-0.861]vs 0.945[95%CI,0.936-0.954];P<0.001)明显低于有乳腺癌病史和西班牙裔女性的放射科医生(0.894[95%CI,0.873-0.910]vs 0.926[95%CI,0.919-0.933];P=0.004)。

结论和相关性:本研究发现,用于自动筛查乳房 X 光解释的深度学习模型的高性能并未推广到更多样化的筛查队列,这表明该模型存在欠规范。本研究表明,在将 AI 模型临床应用之前,需要对其进行模型透明度和针对特定目标人群的微调。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3062/9679879/67ab97bc423a/jamanetwopen-e2242343-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3062/9679879/2fa23d1eb121/jamanetwopen-e2242343-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3062/9679879/161f37ed5ad6/jamanetwopen-e2242343-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3062/9679879/67ab97bc423a/jamanetwopen-e2242343-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3062/9679879/2fa23d1eb121/jamanetwopen-e2242343-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3062/9679879/161f37ed5ad6/jamanetwopen-e2242343-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3062/9679879/67ab97bc423a/jamanetwopen-e2242343-g003.jpg

相似文献

1
External Validation of an Ensemble Model for Automated Mammography Interpretation by Artificial Intelligence.人工智能辅助自动化乳腺 X 线摄影判读的集成模型的外部验证。
JAMA Netw Open. 2022 Nov 1;5(11):e2242343. doi: 10.1001/jamanetworkopen.2022.42343.
2
Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.联合人工智能和放射科医生评估解读筛查性乳房 X 光照片的效果。
JAMA Netw Open. 2020 Mar 2;3(3):e200265. doi: 10.1001/jamanetworkopen.2020.0265.
3
Screening mammography performance according to breast density: a comparison between radiologists versus standalone intelligence detection.根据乳腺密度评估乳腺钼靶筛查性能:放射科医生与独立智能检测的比较。
Breast Cancer Res. 2024 Apr 22;26(1):68. doi: 10.1186/s13058-024-01821-w.
4
External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms.商业人工智能算法独立评估筛查性乳房 X 光照片的外部评估。
JAMA Oncol. 2020 Oct 1;6(10):1581-1588. doi: 10.1001/jamaoncol.2020.3321.
5
Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis.将放射科医生和人工智能的优势相结合用于乳腺癌筛查:一项回顾性分析。
Lancet Digit Health. 2022 Jul;4(7):e507-e519. doi: 10.1016/S2589-7500(22)00070-X.
6
Validation of artificial intelligence contrast mammography in diagnosis of breast cancer: Relationship to histopathological results.人工智能乳腺造影在乳腺癌诊断中的验证:与组织病理学结果的关系。
Eur J Radiol. 2024 Apr;173:111392. doi: 10.1016/j.ejrad.2024.111392. Epub 2024 Feb 23.
7
Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study.人工智能在乳腺癌检测和假阳性召回中的变化:一项回顾性、多读者研究。
Lancet Digit Health. 2020 Mar;2(3):e138-e148. doi: 10.1016/S2589-7500(20)30003-0. Epub 2020 Feb 6.
8
Impact of Artificial Intelligence Decision Support Using Deep Learning on Breast Cancer Screening Interpretation with Single-View Wide-Angle Digital Breast Tomosynthesis.深度学习辅助人工智能决策对单视图广角数字乳腺断层合成乳腺癌筛查解读的影响。
Radiology. 2021 Sep;300(3):529-536. doi: 10.1148/radiol.2021204432. Epub 2021 Jul 6.
9
Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review.独立验证人工智能算法在自动解读乳腺 X 线筛查中的应用:一项系统评价。
J Am Coll Radiol. 2022 Feb;19(2 Pt A):259-273. doi: 10.1016/j.jacr.2021.11.008. Epub 2022 Jan 20.
10
Stand-Alone Use of Artificial Intelligence for Digital Mammography and Digital Breast Tomosynthesis Screening: A Retrospective Evaluation.人工智能在数字乳腺 X 线摄影和数字乳腺断层合成筛查中的独立应用:一项回顾性评估。
Radiology. 2022 Mar;302(3):535-542. doi: 10.1148/radiol.211590. Epub 2021 Dec 14.

引用本文的文献

1
A systematic approach to study the effects of acquisition parameters and biological factors on computerized mammography analysis using ex vivo human tissue: A protocol description.一种使用离体人体组织研究采集参数和生物学因素对计算机乳腺摄影分析影响的系统方法:方案描述
PLoS One. 2025 Aug 18;20(8):e0321658. doi: 10.1371/journal.pone.0321658. eCollection 2025.
2
Progress and challenges of artificial intelligence in lung cancer clinical translation.人工智能在肺癌临床转化中的进展与挑战
NPJ Precis Oncol. 2025 Jul 1;9(1):210. doi: 10.1038/s41698-025-00986-7.
3
Mammographic classification of interval breast cancers and artificial intelligence performance.

本文引用的文献

1
Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review.独立验证人工智能算法在自动解读乳腺 X 线筛查中的应用:一项系统评价。
J Am Coll Radiol. 2022 Feb;19(2 Pt A):259-273. doi: 10.1016/j.jacr.2021.11.008. Epub 2022 Jan 20.
2
Toward Generalizability in the Deployment of Artificial Intelligence in Radiology: Role of Computation Stress Testing to Overcome Underspecification.迈向人工智能在放射学应用中的可推广性:计算压力测试在克服规格不足方面的作用
Radiol Artif Intell. 2021 Oct 27;3(6):e210097. doi: 10.1148/ryai.2021210097. eCollection 2021 Nov.
3
间期乳腺癌的乳腺钼靶分类及人工智能性能
J Natl Cancer Inst. 2025 Apr 18. doi: 10.1093/jnci/djaf103.
4
Artificial Intelligence Is Brittle: We Need to Do Better.人工智能是脆弱的:我们需要做得更好。
Radiol Artif Intell. 2025 May;7(3):e250081. doi: 10.1148/ryai.250081.
5
Artificial intelligence and consistency in patient care: a large-scale longitudinal study of mammographic density assessment.人工智能与患者护理的一致性:一项关于乳腺密度评估的大规模纵向研究
BJR Artif Intell. 2025 Mar 3;2(1):ubaf004. doi: 10.1093/bjrai/ubaf004. eCollection 2025 Jan.
6
AI-Derived Blood Biomarkers for Ovarian Cancer Diagnosis: Systematic Review and Meta-Analysis.用于卵巢癌诊断的人工智能衍生血液生物标志物:系统评价与荟萃分析
J Med Internet Res. 2025 Mar 24;27:e67922. doi: 10.2196/67922.
7
Development and validation of automated three-dimensional convolutional neural network model for acute appendicitis diagnosis.用于急性阑尾炎诊断的自动化三维卷积神经网络模型的开发与验证
Sci Rep. 2025 Mar 5;15(1):7711. doi: 10.1038/s41598-024-84348-6.
8
Development and evaluation of a 3D ensemble framework for automatic diagnosis of early osteonecrosis of the femoral head based on MRI: a multicenter diagnostic study.基于MRI的股骨头早期坏死自动诊断三维集成框架的开发与评估:一项多中心诊断研究
Front Surg. 2025 Feb 14;12:1555749. doi: 10.3389/fsurg.2025.1555749. eCollection 2025.
9
The impact of updated imaging software on the performance of machine learning models for breast cancer diagnosis: a multi-center, retrospective study.更新后的成像软件对乳腺癌诊断机器学习模型性能的影响:一项多中心回顾性研究。
Arch Gynecol Obstet. 2025 Jan 30. doi: 10.1007/s00404-024-07901-8.
10
ClinValAI: A framework for developing Cloud-based infrastructures for the External Clinical Validation of AI in Medical Imaging.ClinValAI:一个用于开发基于云的基础设施以进行医学成像中人工智能外部临床验证的框架。
Pac Symp Biocomput. 2025;30:215-228. doi: 10.1142/9789819807024_0016.
Deep learning in breast radiology: current progress and future directions.
深度学习在乳腺放射学中的应用:现状与未来方向。
Eur Radiol. 2021 Jul;31(7):4872-4885. doi: 10.1007/s00330-020-07640-9. Epub 2021 Jan 15.
4
External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms.商业人工智能算法独立评估筛查性乳房 X 光照片的外部评估。
JAMA Oncol. 2020 Oct 1;6(10):1581-1588. doi: 10.1001/jamaoncol.2020.3321.
5
Artificial Intelligence: A Primer for Breast Imaging Radiologists.人工智能:乳腺影像放射科医生入门指南。
J Breast Imaging. 2020 Aug;2(4):304-314. doi: 10.1093/jbi/wbaa033. Epub 2020 Jun 19.
6
Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.联合人工智能和放射科医生评估解读筛查性乳房 X 光照片的效果。
JAMA Netw Open. 2020 Mar 2;3(3):e200265. doi: 10.1001/jamanetworkopen.2020.0265.
7
Challenges to the Reproducibility of Machine Learning Models in Health Care.医疗保健领域机器学习模型可重复性面临的挑战。
JAMA. 2020 Jan 28;323(4):305-306. doi: 10.1001/jama.2019.20866.
8
Validation of clinical prediction models: what does the "calibration slope" really measure?临床预测模型的验证:“校准斜率”到底在衡量什么?
J Clin Epidemiol. 2020 Feb;118:93-99. doi: 10.1016/j.jclinepi.2019.09.016. Epub 2019 Oct 9.
9
Artificial Intelligence (AI) for the early detection of breast cancer: a scoping review to assess AI's potential in breast screening practice.人工智能(AI)在乳腺癌早期检测中的应用:一项范围综述,评估 AI 在乳腺筛查实践中的潜力。
Expert Rev Med Devices. 2019 May;16(5):351-362. doi: 10.1080/17434440.2019.1610387. Epub 2019 May 3.
10
PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies.PROBAST:一种用于评估偏倚风险和预测模型研究适用性的工具。
Ann Intern Med. 2019 Jan 1;170(1):51-58. doi: 10.7326/M18-1376.