文献检索，用中文搜 PubMed

IMPORTANCE

Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.

OBJECTIVE

To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.

DESIGN, SETTING, AND PARTICIPANTS: In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016.

MAIN OUTCOMES AND MEASUREMENTS

Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated.

RESULTS

Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity.

CONCLUSIONS AND RELEVANCE

While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.

重要性

乳腺 X 线摄影筛查目前依赖于主观的人工解释。人工智能（AI）的进步可以通过减少漏诊的癌症和假阳性来提高乳腺 X 线摄影筛查的准确性。

目的

通过对机器学习算法进行严格的、无偏的评估，评估 AI 是否可以克服人类乳腺 X 线摄影解释的局限性。

设计、环境和参与者：在这项诊断准确性研究中，于 2016 年 9 月至 2017 年 11 月期间进行，举办了一项国际众包挑战赛，以促进专注于解释筛查性乳腺 X 线摄影的 AI 算法的开发。来自 44 个国家的 126 个团队的 1100 多名参与者参加了这项研究。分析于 2016 年 11 月 18 日开始。

主要结果和测量

算法仅使用图像（挑战 1）或结合图像、以前的检查（如果有）以及临床和人口统计学风险因素数据（挑战 2），并输出在 12 个月内癌症呈阳性/阴性的评分。使用曲线下面积评估算法检测乳腺癌的准确性，并与放射科医生的特异性进行比较，将放射科医生的敏感性设定为 85.9%（美国）和 83.9%（瑞典）。开发并评估了一种聚合表现最佳的 AI 算法和放射科医生的召回评估的集成方法。

结果

总体而言，使用了来自 85580 名美国女性的 144231 份筛查性乳腺 X 线摄影图像（12 个月内≤12 个月有 952 例癌症阳性）进行算法训练和验证。第二个独立验证队列包括来自 68008 名瑞典女性的 166578 次检查（780 例癌症阳性）。表现最佳的算法在美国的曲线下面积为 0.858，在瑞典为 0.903，在放射科医生设定的敏感性为 66.2%，在瑞典为 81.2%，特异性低于社区实践放射科医生的特异性 90.5%（美国）和 98.5%（瑞典）。将表现最佳的算法组合并结合美国放射科医生的评估，曲线下面积提高到 0.942，特异性（92.0%）在相同的敏感性下得到显著提高。

结论和相关性

虽然没有单一的 AI 算法超过放射科医生，但在单读筛查环境中，AI 算法的集合与放射科医生的评估相结合，提高了整体准确性。这项研究强调了使用机器学习方法增强乳腺 X 线摄影筛查解释的潜力。

Suppr 超能文献

文献检索

文件翻译

深度研究