Suppr
超能文献

人工智能辅助自动化乳腺 X 线摄影判读的集成模型的外部验证。

External Validation of an Ensemble Model for Automated Mammography Interpretation by Artificial Intelligence.

机构信息

Medical and Imaging Informatics, Department of Radiological Sciences, David Geffen School of Medicine at University California, Los Angeles.

Clinical Research Division, Fred Hutchinson Cancer Center, Seattle, Washington.

出版信息

JAMA Netw Open. 2022 Nov 1;5(11):e2242343. doi: 10.1001/jamanetworkopen.2022.42343.

DOI:10.1001/jamanetworkopen.2022.42343

PMID:36409497

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9679879/

Abstract

IMPORTANCE

With a shortfall in fellowship-trained breast radiologists, mammography screening programs are looking toward artificial intelligence (AI) to increase efficiency and diagnostic accuracy. External validation studies provide an initial assessment of how promising AI algorithms perform in different practice settings.

OBJECTIVE

To externally validate an ensemble deep-learning model using data from a high-volume, distributed screening program of an academic health system with a diverse patient population.

DESIGN, SETTING, AND PARTICIPANTS: In this diagnostic study, an ensemble learning method, which reweights outputs of the 11 highest-performing individual AI models from the Digital Mammography Dialogue on Reverse Engineering Assessment and Methods (DREAM) Mammography Challenge, was used to predict the cancer status of an individual using a standard set of screening mammography images. This study was conducted using retrospective patient data collected between 2010 and 2020 from women aged 40 years and older who underwent a routine breast screening examination and participated in the Athena Breast Health Network at the University of California, Los Angeles (UCLA).

MAIN OUTCOMES AND MEASURES

Performance of the challenge ensemble method (CEM) and the CEM combined with radiologist assessment (CEM+R) were compared with diagnosed ductal carcinoma in situ and invasive cancers within a year of the screening examination using performance metrics, such as sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC).

RESULTS

Evaluated on 37 317 examinations from 26 817 women (mean [SD] age, 58.4 [11.5] years), individual model AUROC estimates ranged from 0.77 (95% CI, 0.75-0.79) to 0.83 (95% CI, 0.81-0.85). The CEM model achieved an AUROC of 0.85 (95% CI, 0.84-0.87) in the UCLA cohort, lower than the performance achieved in the Kaiser Permanente Washington (AUROC, 0.90) and Karolinska Institute (AUROC, 0.92) cohorts. The CEM+R model achieved a sensitivity (0.813 [95% CI, 0.781-0.843] vs 0.826 [95% CI, 0.795-0.856]; P = .20) and specificity (0.925 [95% CI, 0.916-0.934] vs 0.930 [95% CI, 0.929-0.932]; P = .18) similar to the radiologist performance. The CEM+R model had significantly lower sensitivity (0.596 [95% CI, 0.466-0.717] vs 0.850 [95% CI, 0.766-0.923]; P < .001) and specificity (0.803 [95% CI, 0.734-0.861] vs 0.945 [95% CI, 0.936-0.954]; P < .001) than the radiologist in women with a prior history of breast cancer and Hispanic women (0.894 [95% CI, 0.873-0.910] vs 0.926 [95% CI, 0.919-0.933]; P = .004).

CONCLUSIONS AND RELEVANCE

This study found that the high performance of an ensemble deep-learning model for automated screening mammography interpretation did not generalize to a more diverse screening cohort, suggesting that the model experienced underspecification. This study suggests the need for model transparency and fine-tuning of AI models for specific target populations prior to their clinical adoption.

摘要

重要性：随着接受过 fellowship培训的乳腺放射科医生的短缺，乳房 X 光筛查项目正在寻求人工智能（AI）来提高效率和诊断准确性。外部验证研究初步评估了有前途的 AI 算法在不同实践环境中的表现。

目的：使用来自学术健康系统的高容量分布式筛查计划的患者人群的多样性数据，对使用深度学习模型的集成进行外部验证。

设计、设置和参与者：在这项诊断研究中，使用了一种集成学习方法，该方法重新加权了 Digital Mammography Dialogue on Reverse Engineering Assessment and Methods（DREAM）乳房 X 光挑战赛中 11 个表现最佳的 AI 模型的输出，以使用标准的筛查乳房 X 光图像来预测个体的癌症状态。这项研究使用了回顾性患者数据，这些数据是在 2010 年至 2020 年期间从 40 岁及以上接受常规乳房筛查检查并参加加州大学洛杉矶分校（UCLA）雅典娜乳房健康网络的女性中收集的。

主要结果和措施：使用性能指标（如敏感性、特异性和接收者操作特征曲线下的面积（AUROC））比较了挑战集成方法（CEM）和 CEM 与放射科医生评估相结合（CEM+R）的性能，与一年内筛查检查的导管原位癌和浸润性癌的诊断结果进行比较。

结果：在 26817 名女性的 37317 次检查中评估（平均[SD]年龄，58.4[11.5]岁），个体模型 AUROC 估计值范围为 0.77（95%CI，0.75-0.79）至 0.83（95%CI，0.81-0.85）。在 UCLA 队列中，CEM 模型的 AUROC 为 0.85（95%CI，0.84-0.87），低于 Kaiser Permanente Washington（AUROC，0.90）和 Karolinska Institute（AUROC，0.92）队列的性能。CEM+R 模型的敏感性（0.813[95%CI，0.781-0.843]vs 0.826[95%CI，0.795-0.856]；P=0.20）和特异性（0.925[95%CI，0.916-0.934]vs 0.930[95%CI，0.929-0.932]；P=0.18）与放射科医生的表现相似。CEM+R 模型的敏感性（0.596[95%CI，0.466-0.717]vs 0.850[95%CI，0.766-0.923]；P<0.001）和特异性（0.803[95%CI，0.734-0.861]vs 0.945[95%CI，0.936-0.954]；P<0.001）明显低于有乳腺癌病史和西班牙裔女性的放射科医生（0.894[95%CI，0.873-0.910]vs 0.926[95%CI，0.919-0.933]；P=0.004）。

结论和相关性：本研究发现，用于自动筛查乳房 X 光解释的深度学习模型的高性能并未推广到更多样化的筛查队列，这表明该模型存在欠规范。本研究表明，在将 AI 模型临床应用之前，需要对其进行模型透明度和针对特定目标人群的微调。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3062/9679879/2fa23d1eb121/jamanetwopen-e2242343-g001.jpg

相似文献

External Validation of an Ensemble Model for Automated Mammography Interpretation by Artificial Intelligence.

JAMA Netw Open. 2022 Nov 1;5(11):e2242343. doi: 10.1001/jamanetworkopen.2022.42343.

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.

JAMA Netw Open. 2020 Mar 2;3(3):e200265. doi: 10.1001/jamanetworkopen.2020.0265.

Screening mammography performance according to breast density: a comparison between radiologists versus standalone intelligence detection.

Breast Cancer Res. 2024 Apr 22;26(1):68. doi: 10.1186/s13058-024-01821-w.

External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms.

JAMA Oncol. 2020 Oct 1;6(10):1581-1588. doi: 10.1001/jamaoncol.2020.3321.

Combining the strengths of radiologists and AI for breast cancer screening: a retrospective analysis.

Lancet Digit Health. 2022 Jul;4(7):e507-e519. doi: 10.1016/S2589-7500(22)00070-X.

Validation of artificial intelligence contrast mammography in diagnosis of breast cancer: Relationship to histopathological results.

Eur J Radiol. 2024 Apr;173:111392. doi: 10.1016/j.ejrad.2024.111392. Epub 2024 Feb 23.

Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study.

Lancet Digit Health. 2020 Mar;2(3):e138-e148. doi: 10.1016/S2589-7500(20)30003-0. Epub 2020 Feb 6.

Impact of Artificial Intelligence Decision Support Using Deep Learning on Breast Cancer Screening Interpretation with Single-View Wide-Angle Digital Breast Tomosynthesis.

Radiology. 2021 Sep;300(3):529-536. doi: 10.1148/radiol.2021204432. Epub 2021 Jul 6.

Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review.

J Am Coll Radiol. 2022 Feb;19(2 Pt A):259-273. doi: 10.1016/j.jacr.2021.11.008. Epub 2022 Jan 20.

Stand-Alone Use of Artificial Intelligence for Digital Mammography and Digital Breast Tomosynthesis Screening: A Retrospective Evaluation.

Radiology. 2022 Mar;302(3):535-542. doi: 10.1148/radiol.211590. Epub 2021 Dec 14.

引用本文的文献

A systematic approach to study the effects of acquisition parameters and biological factors on computerized mammography analysis using ex vivo human tissue: A protocol description.

PLoS One. 2025 Aug 18;20(8):e0321658. doi: 10.1371/journal.pone.0321658. eCollection 2025.

Progress and challenges of artificial intelligence in lung cancer clinical translation.

NPJ Precis Oncol. 2025 Jul 1;9(1):210. doi: 10.1038/s41698-025-00986-7.

Mammographic classification of interval breast cancers and artificial intelligence performance.

J Natl Cancer Inst. 2025 Apr 18. doi: 10.1093/jnci/djaf103.

Artificial Intelligence Is Brittle: We Need to Do Better.

Radiol Artif Intell. 2025 May;7(3):e250081. doi: 10.1148/ryai.250081.

Artificial intelligence and consistency in patient care: a large-scale longitudinal study of mammographic density assessment.

BJR Artif Intell. 2025 Mar 3;2(1):ubaf004. doi: 10.1093/bjrai/ubaf004. eCollection 2025 Jan.

AI-Derived Blood Biomarkers for Ovarian Cancer Diagnosis: Systematic Review and Meta-Analysis.

J Med Internet Res. 2025 Mar 24;27:e67922. doi: 10.2196/67922.

Development and validation of automated three-dimensional convolutional neural network model for acute appendicitis diagnosis.

Sci Rep. 2025 Mar 5;15(1):7711. doi: 10.1038/s41598-024-84348-6.

Development and evaluation of a 3D ensemble framework for automatic diagnosis of early osteonecrosis of the femoral head based on MRI: a multicenter diagnostic study.

Front Surg. 2025 Feb 14;12:1555749. doi: 10.3389/fsurg.2025.1555749. eCollection 2025.

The impact of updated imaging software on the performance of machine learning models for breast cancer diagnosis: a multi-center, retrospective study.

Arch Gynecol Obstet. 2025 Jan 30. doi: 10.1007/s00404-024-07901-8.

ClinValAI: A framework for developing Cloud-based infrastructures for the External Clinical Validation of AI in Medical Imaging.

Pac Symp Biocomput. 2025;30:215-228. doi: 10.1142/9789819807024_0016.

本文引用的文献

Independent External Validation of Artificial Intelligence Algorithms for Automated Interpretation of Screening Mammography: A Systematic Review.

J Am Coll Radiol. 2022 Feb;19(2 Pt A):259-273. doi: 10.1016/j.jacr.2021.11.008. Epub 2022 Jan 20.

Toward Generalizability in the Deployment of Artificial Intelligence in Radiology: Role of Computation Stress Testing to Overcome Underspecification.

Radiol Artif Intell. 2021 Oct 27;3(6):e210097. doi: 10.1148/ryai.2021210097. eCollection 2021 Nov.

Deep learning in breast radiology: current progress and future directions.

Eur Radiol. 2021 Jul;31(7):4872-4885. doi: 10.1007/s00330-020-07640-9. Epub 2021 Jan 15.

External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms.

JAMA Oncol. 2020 Oct 1;6(10):1581-1588. doi: 10.1001/jamaoncol.2020.3321.

Artificial Intelligence: A Primer for Breast Imaging Radiologists.

J Breast Imaging. 2020 Aug;2(4):304-314. doi: 10.1093/jbi/wbaa033. Epub 2020 Jun 19.

Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.

JAMA Netw Open. 2020 Mar 2;3(3):e200265. doi: 10.1001/jamanetworkopen.2020.0265.

Challenges to the Reproducibility of Machine Learning Models in Health Care.

JAMA. 2020 Jan 28;323(4):305-306. doi: 10.1001/jama.2019.20866.

Validation of clinical prediction models: what does the "calibration slope" really measure?

J Clin Epidemiol. 2020 Feb;118:93-99. doi: 10.1016/j.jclinepi.2019.09.016. Epub 2019 Oct 9.

Artificial Intelligence (AI) for the early detection of breast cancer: a scoping review to assess AI's potential in breast screening practice.

Expert Rev Med Devices. 2019 May;16(5):351-362. doi: 10.1080/17434440.2019.1610387. Epub 2019 May 3.

PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies.

Ann Intern Med. 2019 Jan 1;170(1):51-58. doi: 10.7326/M18-1376.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

人工智能辅助自动化乳腺 X 线摄影判读的集成模型的外部验证。

External Validation of an Ensemble Model for Automated Mammography Interpretation by Artificial Intelligence.

机构信息

出版信息

IMPORTANCE

OBJECTIVE

MAIN OUTCOMES AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译