Division of Radiology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany.
Heidelberg University Medical School, Heidelberg, Germany.
Eur Radiol. 2023 Nov;33(11):7463-7476. doi: 10.1007/s00330-023-09882-9. Epub 2023 Jul 28.
OBJECTIVES: To evaluate a fully automatic deep learning system to detect and segment clinically significant prostate cancer (csPCa) on same-vendor prostate MRI from two different institutions not contributing to training of the system. MATERIALS AND METHODS: In this retrospective study, a previously bi-institutionally validated deep learning system (UNETM) was applied to bi-parametric prostate MRI data from one external institution (A), a PI-RADS distribution-matched internal cohort (B), and a csPCa stratified subset of single-institution external public challenge data (C). csPCa was defined as ISUP Grade Group ≥ 2 determined from combined targeted and extended systematic MRI/transrectal US-fusion biopsy. Performance of UNETM was evaluated by comparing ROC AUC and specificity at typical PI-RADS sensitivity levels. Lesion-level analysis between UNETM segmentations and radiologist-delineated segmentations was performed using Dice coefficient, free-response operating characteristic (FROC), and weighted alternative (waFROC). The influence of using different diffusion sequences was analyzed in cohort A. RESULTS: In 250/250/140 exams in cohorts A/B/C, differences in ROC AUC were insignificant with 0.80 (95% CI: 0.74-0.85)/0.87 (95% CI: 0.83-0.92)/0.82 (95% CI: 0.75-0.89). At sensitivities of 95% and 90%, UNETM achieved specificity of 30%/50% in A, 44%/71% in B, and 43%/49% in C, respectively. Dice coefficient of UNETM and radiologist-delineated lesions was 0.36 in A and 0.49 in B. The waFROC AUC was 0.67 (95% CI: 0.60-0.83) in A and 0.7 (95% CI: 0.64-0.78) in B. UNETM performed marginally better on readout-segmented than on single-shot echo-planar-imaging. CONCLUSION: For same-vendor examinations, deep learning provided comparable discrimination of csPCa and non-csPCa lesions and examinations between local and two independent external data sets, demonstrating the applicability of the system to institutions not participating in model training. CLINICAL RELEVANCE STATEMENT: A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets, indicating the potential of deploying AI models without retraining or fine-tuning, and corroborating evidence that AI models extract a substantial amount of transferable domain knowledge about MRI-based prostate cancer assessment. KEY POINTS: • A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets. • Lesion detection performance and segmentation congruence was similar on the institutional and an external data set, as measured by the weighted alternative FROC AUC and Dice coefficient. • Although the system generalized to two external institutions without re-training, achieving expected sensitivity and specificity levels using the deep learning system requires probability thresholds to be adjusted, underlining the importance of institution-specific calibration and quality control.
目的:评估一种全自动深度学习系统,用于检测和分割来自两个不同机构的同一家供应商前列腺 MRI 上的临床显著前列腺癌(csPCa),这两个机构均未参与系统训练。
材料与方法:在这项回顾性研究中,将之前经过双机构验证的深度学习系统(UNETM)应用于来自外部机构 A 的双参数前列腺 MRI 数据、与 PI-RADS 分布匹配的内部队列 B 和单机构外部公共挑战数据的 csPCa 分层子集中 C。csPCa 的定义为 ISUP 分级组≥2,由联合靶向和扩展系统 MRI/经直肠超声融合活检确定。通过比较典型 PI-RADS 灵敏度水平的 ROC AUC 和特异性来评估 UNETM 的性能。使用 Dice 系数、自由响应操作特征(FROC)和加权替代(waFROC)在 UNETM 分割和放射科医生勾画的分割之间进行病变水平分析。在队列 A 中分析了使用不同扩散序列的影响。
结果:在队列 A/B/C 的 250/250/140 次检查中,ROC AUC 的差异不显著,分别为 0.80(95%CI:0.74-0.85)/0.87(95%CI:0.83-0.92)/0.82(95%CI:0.75-0.89)。在灵敏度为 95%和 90%时,UNETM 在 A 中的特异性分别为 30%/50%,在 B 中为 44%/71%,在 C 中为 43%/49%。UNETM 和放射科医生勾画病变的 Dice 系数在 A 中为 0.36,在 B 中为 0.49。A 中的 waFROC AUC 为 0.67(95%CI:0.60-0.83),B 中的 waFROC AUC 为 0.7(95%CI:0.64-0.78)。在读取分割与单次激发回波平面成像相比,UNETM 的性能略好。
结论:对于同一家供应商的检查,深度学习在本地和两个独立外部数据集之间提供了可比的 csPCa 和非 csPCa 病变和检查的区分能力,证明了该系统适用于未参与模型训练的机构。
临床相关性声明:之前经过双机构验证的全自动深度学习系统在两个独立的外部数据集上保持了可接受的检查水平诊断性能,表明无需重新训练或微调即可部署人工智能模型的潜力,并证实了人工智能模型可以提取大量关于基于 MRI 的前列腺癌评估的可转移领域知识。
要点:
J Imaging Inform Med. 2025-8-29
Diagnostics (Basel). 2025-5-26
Radiol Med. 2024-9
Eur Radiol. 2023-11