From the Department of Epidemiology, University of Hawaii Cancer Center, 701 Ilalo St, Suite 522, Honolulu, HI 96813 (X.Z., T.K.W., L.L., J.A.S.); Department of Information and Computer Sciences, University of Hawaii at Manoa, Honolulu, Hawaii (L.L., P.S.); Department of Health Sciences Research, Mayo Clinic, Rochester, Minn (M.J., C.S., S.W., C.V.); and Departments of Medicine and Epidemiology/Biostatistics, University of California, San Francisco, San Francisco, Calif (K.K.).
Radiology. 2021 Dec;301(3):550-558. doi: 10.1148/radiol.2021203758. Epub 2021 Sep 7.
Background The ability of deep learning (DL) models to classify women as at risk for either screening mammography-detected or interval cancer (not detected at mammography) has not yet been explored in the literature. Purpose To examine the ability of DL models to estimate the risk of interval and screening-detected breast cancers with and without clinical risk factors. Materials and Methods This study was performed on 25 096 digital screening mammograms obtained from January 2006 to December 2013. The mammograms were obtained in 6369 women without breast cancer, 1609 of whom developed screening-detected breast cancer and 351 of whom developed interval invasive breast cancer. A DL model was trained on the negative mammograms to classify women into those who did not develop cancer and those who developed screening-detected cancer or interval invasive cancer. Model effectiveness was evaluated as a matched concordance statistic (C statistic) in a held-out 26% (1669 of 6369) test set of the mammograms. Results The C statistics and odds ratios for comparing patients with screening-detected cancer versus matched controls were 0.66 (95% CI: 0.63, 0.69) and 1.25 (95% CI: 1.17, 1.33), respectively, for the DL model, 0.62 (95% CI: 0.59, 0.65) and 2.14 (95% CI: 1.32, 3.45) for the clinical risk factors with the Breast Imaging Reporting and Data System (BI-RADS) density model, and 0.66 (95% CI: 0.63, 0.69) and 1.21 (95% CI: 1.13, 1.30) for the combined DL and clinical risk factors model. For comparing patients with interval cancer versus controls, the C statistics and odds ratios were 0.64 (95% CI: 0.58, 0.71) and 1.26 (95% CI: 1.10, 1.45), respectively, for the DL model, 0.71 (95% CI: 0.65, 0.77) and 7.25 (95% CI: 2.94, 17.9) for the risk factors with BI-RADS density (b rated vs non-b rated) model, and 0.72 (95% CI: 0.66, 0.78) and 1.10 (95% CI: 0.94, 1.29) for the combined DL and clinical risk factors model. The values between the DL, BI-RADS, and combined model's ability to detect screen and interval cancer were .99, .002, and .03, respectively. Conclusion The deep learning model outperformed in determining screening-detected cancer risk but underperformed for interval cancer risk when compared with clinical risk factors including breast density. © RSNA, 2021 See also the editorial by Bae and Kim in this issue.
背景 深度学习 (DL) 模型对筛查性乳房 X 光检查检测到或间隔性 (未在乳房 X 光检查中检测到) 癌症风险的分类能力在文献中尚未得到探讨。目的 研究 DL 模型在有无临床危险因素的情况下,对间隔性和筛查性乳腺癌风险的估计能力。材料与方法 本研究对 2006 年 1 月至 2013 年 12 月期间获得的 25096 份数字筛查乳房 X 光片进行了研究。这些乳房 X 光片来自 6369 名未患乳腺癌的女性,其中 1609 名女性发生了筛查性乳腺癌,351 名女性发生了间隔性浸润性乳腺癌。在一组阴性乳房 X 光片中,利用 DL 模型对女性进行分类,将其分为未发生癌症的女性和发生筛查性乳腺癌或间隔性浸润性乳腺癌的女性。在一个保留的 26%(6369 名中的 1669 名)的乳房 X 光片测试集中,用一致性匹配统计量(C 统计量)评估模型的有效性。结果 在比较发生筛查性乳腺癌的患者与匹配对照者的情况下,DL 模型的 C 统计量和优势比分别为 0.66(95%CI:0.63,0.69)和 1.25(95%CI:1.17,1.33),而临床危险因素与乳腺影像报告和数据系统(BI-RADS)密度模型的 C 统计量和优势比分别为 0.62(95%CI:0.59,0.65)和 2.14(95%CI:1.32,3.45),而 DL 和临床危险因素联合模型的 C 统计量和优势比分别为 0.66(95%CI:0.63,0.69)和 1.21(95%CI:1.13,1.30)。在比较发生间隔性癌症的患者与对照者的情况下,DL 模型的 C 统计量和优势比分别为 0.64(95%CI:0.58,0.71)和 1.26(95%CI:1.10,1.45),而危险因素与 BI-RADS 密度(b 评分与非 b 评分)模型的 C 统计量和优势比分别为 0.71(95%CI:0.65,0.77)和 7.25(95%CI:2.94,17.9),而 DL 和临床危险因素联合模型的 C 统计量和优势比分别为 0.72(95%CI:0.66,0.78)和 1.10(95%CI:0.94,1.29)。DL、BI-RADS 和联合模型在检测筛查性和间隔性癌症方面的 值分别为.99、.002 和.03。结论 与包括乳腺密度在内的临床危险因素相比,DL 模型在确定筛查性乳腺癌风险方面表现更好,但在确定间隔性癌症风险方面表现较差。