Grabke Emerson P, Heming Carolina A M, Hadari Amit, Finelli Antonio, Ghai Sangeet, Lajkosz Katherine, Taati Babak, Haider Masoom A
Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Canada.
Institute of Biomedical Engineering, University of Toronto, Toronto, Canada.
Abdom Radiol (NY). 2025 May 30. doi: 10.1007/s00261-025-05019-2.
To train and evaluate the performance of a machine learning triaging tool that identifies MRI negative for clinically significant prostate cancer and to compare this against non-MRI models.
2895 MRIs were collected from two sources (1630 internal, 1265 public) in this retrospective study. Risk models compared were: Prostate Cancer Prevention Trial Risk Calculator 2.0, Prostate Biopsy Collaborative Group Calculator, PSA density, U-Net segmentation, and U-Net combined with clinical parameters. The reference standard was histopathology or negative follow-up. Performance metrics were calculated by simulating a triaging workflow compared to radiologist interpreting all exams on a test set of 465 patients. Sensitivity and specificity differences were assessed using the McNemar test. Differences in PPV and NPV were assessed using the Leisenring, Alonzo and Pepe generalized score statistic. Equivalence test p-values were adjusted within each measure using Benjamini-Hochberg correction.
Triaging using U-Net with clinical parameters reduced radiologist workload by 12.5% with sensitivity decrease from 93 to 90% (p = 0.023) and specificity increase from 39 to 47% (p < 0.001). This simulated workload reduction was greater than triaging with risk calculators (3.2% and 1.3%, p < 0.001), and comparable to PSA density (8.4%, p = 0.071) and U-Net alone (11.6%, p = 0.762). Both U-Net triaging strategies increased PPV (+ 2.8% p = 0.005 clinical, + 2.2% p = 0.020 nonclinical), unlike non-U-Net strategies (p > 0.05). NPV remained equivalent for all scenarios (p > 0.05). Clinically-informed U-Net triaging correctly ruled out 20 (13.4%) radiologist false positives (12 PI-RADS = 3, 8 PI-RADS = 4). Of the eight (3.6%) false negatives, two were misclassified by the radiologist. No misclassified case was interpreted as PI-RADS 5.
Prostate MRI triaging using machine learning could reduce radiologist workload by 12.5% with a 3% sensitivity decrease and 8% specificity increase, outperforming triaging using non-imaging-based risk models. Further prospective validation is required.
训练并评估一种机器学习分诊工具的性能,该工具可识别临床上显著前列腺癌的MRI阴性结果,并将其与非MRI模型进行比较。
在这项回顾性研究中,从两个来源收集了2895份MRI(1630份内部数据,1265份公开数据)。所比较的风险模型包括:前列腺癌预防试验风险计算器2.0、前列腺活检协作组计算器、PSA密度、U-Net分割以及结合临床参数的U-Net。参考标准为组织病理学或阴性随访结果。通过模拟分诊工作流程计算性能指标,与放射科医生解读465例患者测试集上的所有检查结果进行比较。使用McNemar检验评估敏感性和特异性差异。使用Leisenring、Alonzo和Pepe广义得分统计量评估PPV和NPV的差异。在每项测量中,使用Benjamini-Hochberg校正调整等效性检验p值。
使用结合临床参数的U-Net进行分诊可将放射科医生的工作量减少12.5%,敏感性从93%降至90%(p = 0.023),特异性从39%增至47%(p < 0.001)。这种模拟的工作量减少幅度大于使用风险计算器进行的分诊(3.2%和1.3%,p < 0.001),与PSA密度(8.