Suppr超能文献

基于弱监督的 MRI 切片级深度学习前列腺癌分类可近似全体素和切片级标注:训练集大小增加的影响。

Weakly Supervised MRI Slice-Level Deep Learning Classification of Prostate Cancer Approximates Full Voxel- and Slice-Level Annotation: Effect of Increasing Training Set Size.

机构信息

Division of Radiology, German Cancer Research Center (DKFZ), Heidelberg, Germany.

Heidelberg University Medical School, Heidelberg, Germany.

出版信息

J Magn Reson Imaging. 2024 Apr;59(4):1409-1422. doi: 10.1002/jmri.28891. Epub 2023 Jul 28.

Abstract

BACKGROUND

Weakly supervised learning promises reduced annotation effort while maintaining performance.

PURPOSE

To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC).

STUDY TYPE

Retrospective.

SUBJECTS

One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695).

FIELD STRENGTH/SEQUENCE: 1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging.

ASSESSMENT

Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions.

STATISTICAL TESTS

Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05.

RESULTS

Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70).

DATA CONCLUSION

Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data.

EVIDENCE LEVEL

3 TECHNICAL EFFICACY: Stage 2.

摘要

背景

弱监督学习有望在保持性能的同时减少注释工作量。

目的

比较前列腺癌(PC)深度卷积分类网络(CNN)的全切片标记训练与弱监督训练。

研究类型

回顾性。

受试者

2015 年 1 月至 2020 年 11 月,1489 例连续机构前列腺 MRI 检查来自怀疑 PC 的男性(65±8 岁),分为训练集(N=794,用 204 例 PROSTATEx 检查丰富)和测试集(N=695)。

场强/序列:1.5 和 3T,T2 加权涡轮自旋回波和扩散加权回波平面成像。

评估

组织病理学的真实情况由靶向和扩展系统活检提供。参考训练使用切片级注释(SLA)进行,并与迭代训练进行比较,迭代训练利用患者级注释(PLA),将 CNN 估计的监督反馈纳入下一个训练迭代,训练集大小逐渐增加(N=200、500、998)。通过比较固定灵敏度为 0.97[254/262]模拟 PI-RADS≥3 和 0.88-0.90[231-236/262]模拟 PI-RADS≥4 决策的特异性,评估模型性能。

统计学检验

使用 DeLong 和 Obuchowski 检验比较接收器工作特征(ROC)和曲线下面积(AUC)。使用 McNemar 检验比较灵敏度和特异性。统计显著性阈值为 P=0.05。

结果

测试集(N=695)SLA(用 200/500/998 次检查训练)的 ROC-AUC 性能分别为 0.75/0.80/0.83。PLA 实现了较低的 ROC-AUC 为 0.64/0.72/0.78。随着训练集大小的增加,两者的性能均显著提高。SLA 在 500 次检查中的 ROC-AUC 与 PLA 在 998 次检查中的 ROC-AUC 相当(P=0.28)。在相同的训练集大小下,SLA 和 PLA 的 ROC-AUC 存在显著差异,但是从 200 次到 998 次训练检查,ROC-AUC 差异显著减小。模拟 PI-RADS≥3 决策,PLA 特异性为 0.12[51/433],SLA 特异性为 0.13[55/433],差异变得不可察觉(P=1.0)在 998 次检查中。模拟 PI-RADS≥4 决策,在 998 次检查中,SLA 特异性为 0.51[221/433]仍然高于 PLA 特异性为 0.39[170/433]。然而,PLA 在 998 次检查中的特异性变得与 SLA 特异性 0.37[159/433]在 200 次检查时相当(P=0.70)。

数据结论

使用仅患者级注释的分类 CNN 进行弱监督训练与使用切片级注释的训练相比性能较低,但随着额外的训练数据,性能显著提高。

证据水平

3 技术功效:阶段 2。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验