Oviedo Felipe, Kazerouni Anum S, Liznerski Philipp, Xu Yixi, Hirano Michael, Vandermeulen Robert A, Kloft Marius, Blum Elyse, Alessio Adam M, Li Christopher I, Weeks William B, Dodhia Rahul, Lavista Ferres Juan M, Rahbar Habib, Partridge Savannah C
AI for Good Lab, Microsoft, 1 Microsoft Way, 4330 150th Ave NE, Redmond, WA 98052.
Department of Radiology, University of Washington School of Medicine, Seattle, Wash.
Radiology. 2025 Jul;316(1):e241629. doi: 10.1148/radiol.241629.
Background Artificial intelligence (AI) models hold potential to increase the accuracy and efficiency of breast MRI screening; however, existing models have not been rigorously evaluated in populations with low cancer prevalence and lack interpretability, both of which are essential for clinical adoption. Purpose To develop an explainable AI model for cancer detection at breast MRI that is effective in both high- and low-cancer-prevalence settings. Materials and Methods This retrospective study included 9738 breast MRI examinations from a single institution (2005-2022), with external testing in a publicly available multicenter dataset (221 examinations). In total, 9567 consecutive examinations were used to develop an explainable fully convolutional data description (FCDD) anomaly detection model to detect malignancies on contrast-enhanced MRI scans. Performance was evaluated in three cohorts: grouped cross-validation (for both balanced [20.0% malignant] and imbalanced [1.85% malignant] detection tasks), an internal independent test set (171 examinations), and an external dataset. Explainability was assessed through pixelwise comparisons with reference-standard malignancy annotations. Statistical significance was assessed using the Wilcoxon signed rank test. Results FCDD outperformed the benchmark binary cross-entropy (BCE) model in cross-validation for both balanced (mean area under the receiver operating characteristic curve [AUC] = 0.84 ± 0.01 [SD] vs 0.81 ± 0.01; < .001) and imbalanced (mean AUC = 0.72 ± 0.03 vs 0.69 ± 0.03; < .001) detection tasks. At a fixed 97% sensitivity in the imbalanced setting, mean specificity across folds was 13% for FCDD and 9% for BCE ( = .02). In the internal test set, FCDD outperformed BCE for balanced (mean AUC = 0.81 ± 0.02 vs 0.72 ± 0.02; < .001) and imbalanced (mean AUC = 0.78 ± 0.05 vs 0.76 ± 0.01; < .02) detection tasks. For model explainability, FCDD demonstrated better spatial agreement with reference-standard annotations than BCE (internal test set: mean pixelwise AUC = 0.92 ± 0.10 vs 0.81 ± 0.13; < .001). External testing confirmed that FCDD performed well, and better than BCE, in the balanced detection task (AUC = 0.86 ± 0.01 vs 0.79 ± 0.01; < .001). Conclusion The developed explainable AI model for cancer detection at breast MRI accurately depicted tumor location and outperformed commonly used models in both high- and low-cancer-prevalence scenarios. © RSNA, 2025 See also the editorial by Bae and Ham in this issue.
背景 人工智能(AI)模型有望提高乳腺MRI筛查的准确性和效率;然而,现有模型尚未在癌症患病率较低的人群中进行严格评估,且缺乏可解释性,而这两点对于临床应用至关重要。目的 开发一种用于乳腺MRI癌症检测的可解释AI模型,该模型在高癌症患病率和低癌症患病率环境中均有效。材料与方法 这项回顾性研究纳入了来自单一机构的9738例乳腺MRI检查(2005 - 2022年),并在一个公开可用的多中心数据集中进行外部测试(221例检查)。总共9567例连续检查用于开发一种可解释的全卷积数据描述(FCDD)异常检测模型,以在对比增强MRI扫描上检测恶性肿瘤。在三个队列中评估性能:分组交叉验证(用于平衡[20.0%为恶性]和不平衡[1.85%为恶性]检测任务)、内部独立测试集(171例检查)和外部数据集。通过与参考标准恶性肿瘤注释进行逐像素比较来评估可解释性。使用Wilcoxon符号秩检验评估统计学显著性。结果 在交叉验证中,对于平衡(平均受试者操作特征曲线下面积[AUC]=0.84±0.01[标准差]对0.81±0.01;P<0.001)和不平衡(平均AUC = 0.72±0.03对0.69±0.03;P<0.001)检测任务,FCDD均优于基准二元交叉熵(BCE)模型。在不平衡设置下固定灵敏度为97%时,FCDD在各折的平均特异性为13%,BCE为9%(P = 0.02)。在内部测试集中,对于平衡(平均AUC = 0.81±0.02对0.72±0.02;P<0.001)和不平衡(平均AUC = 0.78±0.05对0.76±0.01;P<0.02)检测任务,FCDD优于BCE。对于模型可解释性,FCDD与参考标准注释的空间一致性优于BCE(内部测试集:平均逐像素AUC = 0.92±0.10对0.81±0.13;P<0.001)。外部测试证实,在平衡检测任务中,FCDD表现良好且优于BCE(AUC = 0.86±0.01对0.79±0.01;P<0.001)。结论 所开发的用于乳腺MRI癌症检测的可解释AI模型准确描绘了肿瘤位置,在高癌症患病率和低癌症患病率情况下均优于常用模型。©RSNA,2025 另见本期Bae和Ham的社论。