Division of Radiology, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Heidelberg University Medical School, Heidelberg, Germany.
J Magn Reson Imaging. 2024 Apr;59(4):1409-1422. doi: 10.1002/jmri.28891. Epub 2023 Jul 28.
Weakly supervised learning promises reduced annotation effort while maintaining performance.
To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC).
Retrospective.
One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695).
FIELD STRENGTH/SEQUENCE: 1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging.
Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions.
Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05.
Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70).
Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data.
3 TECHNICAL EFFICACY: Stage 2.
弱监督学习有望在保持性能的同时减少注释工作量。
比较前列腺癌(PC)深度卷积分类网络(CNN)的全切片标记训练与弱监督训练。
回顾性。
2015 年 1 月至 2020 年 11 月,1489 例连续机构前列腺 MRI 检查来自怀疑 PC 的男性(65±8 岁),分为训练集(N=794,用 204 例 PROSTATEx 检查丰富)和测试集(N=695)。
场强/序列:1.5 和 3T,T2 加权涡轮自旋回波和扩散加权回波平面成像。
组织病理学的真实情况由靶向和扩展系统活检提供。参考训练使用切片级注释(SLA)进行,并与迭代训练进行比较,迭代训练利用患者级注释(PLA),将 CNN 估计的监督反馈纳入下一个训练迭代,训练集大小逐渐增加(N=200、500、998)。通过比较固定灵敏度为 0.97[254/262]模拟 PI-RADS≥3 和 0.88-0.90[231-236/262]模拟 PI-RADS≥4 决策的特异性,评估模型性能。
使用 DeLong 和 Obuchowski 检验比较接收器工作特征(ROC)和曲线下面积(AUC)。使用 McNemar 检验比较灵敏度和特异性。统计显著性阈值为 P=0.05。
测试集(N=695)SLA(用 200/500/998 次检查训练)的 ROC-AUC 性能分别为 0.75/0.80/0.83。PLA 实现了较低的 ROC-AUC 为 0.64/0.72/0.78。随着训练集大小的增加,两者的性能均显著提高。SLA 在 500 次检查中的 ROC-AUC 与 PLA 在 998 次检查中的 ROC-AUC 相当(P=0.28)。在相同的训练集大小下,SLA 和 PLA 的 ROC-AUC 存在显著差异,但是从 200 次到 998 次训练检查,ROC-AUC 差异显著减小。模拟 PI-RADS≥3 决策,PLA 特异性为 0.12[51/433],SLA 特异性为 0.13[55/433],差异变得不可察觉(P=1.0)在 998 次检查中。模拟 PI-RADS≥4 决策,在 998 次检查中,SLA 特异性为 0.51[221/433]仍然高于 PLA 特异性为 0.39[170/433]。然而,PLA 在 998 次检查中的特异性变得与 SLA 特异性 0.37[159/433]在 200 次检查时相当(P=0.70)。
使用仅患者级注释的分类 CNN 进行弱监督训练与使用切片级注释的训练相比性能较低,但随着额外的训练数据,性能显著提高。
3 技术功效:阶段 2。