Netzer Nils, Weißer Cedric, Schelb Patrick, Wang Xianfeng, Qin Xiaoyan, Görtz Magdalena, Schütz Viktoria, Radtke Jan Philipp, Hielscher Thomas, Schwab Constantin, Stenzinger Albrecht, Kuder Tristan Anselm, Gnirs Regula, Hohenfellner Markus, Schlemmer Heinz-Peter, Maier-Hein Klaus H, Bonekamp David
Department of Urology, University of Heidelberg Medical Center.
Division of Biostatistics, German Cancer Research Center.
Invest Radiol. 2021 Dec 1;56(12):799-808. doi: 10.1097/RLI.0000000000000791.
The potential of deep learning to support radiologist prostate magnetic resonance imaging (MRI) interpretation has been demonstrated.
The aim of this study was to evaluate the effects of increased and diversified training data (TD) on deep learning performance for detection and segmentation of clinically significant prostate cancer-suspicious lesions.
In this retrospective study, biparametric (T2-weighted and diffusion-weighted) prostate MRI acquired with multiple 1.5-T and 3.0-T MRI scanners in consecutive men was used for training and testing of prostate segmentation and lesion detection networks. Ground truth was the combination of targeted and extended systematic MRI-transrectal ultrasound fusion biopsies, with significant prostate cancer defined as International Society of Urological Pathology grade group greater than or equal to 2. U-Nets were internally validated on full, reduced, and PROSTATEx-enhanced training sets and subsequently externally validated on the institutional test set and the PROSTATEx test set. U-Net segmentation was calibrated to clinically desired levels in cross-validation, and test performance was subsequently compared using sensitivities, specificities, predictive values, and Dice coefficient.
One thousand four hundred eighty-eight institutional examinations (median age, 64 years; interquartile range, 58-70 years) were temporally split into training (2014-2017, 806 examinations, supplemented by 204 PROSTATEx examinations) and test (2018-2020, 682 examinations) sets. In the test set, Prostate Imaging-Reporting and Data System (PI-RADS) cutoffs greater than or equal to 3 and greater than or equal to 4 on a per-patient basis had sensitivity of 97% (241/249) and 90% (223/249) at specificity of 19% (82/433) and 56% (242/433), respectively. The full U-Net had corresponding sensitivity of 97% (241/249) and 88% (219/249) with specificity of 20% (86/433) and 59% (254/433), not statistically different from PI-RADS (P > 0.3 for all comparisons). U-Net trained using a reduced set of 171 consecutive examinations achieved inferior performance (P < 0.001). PROSTATEx training enhancement did not improve performance. Dice coefficients were 0.90 for prostate and 0.42/0.53 for MRI lesion segmentation at PI-RADS category 3/4 equivalents.
In a large institutional test set, U-Net confirms similar performance to clinical PI-RADS assessment and benefits from more TD, with neither institutional nor PROSTATEx performance improved by adding multiscanner or bi-institutional TD.
深度学习在辅助放射科医生解读前列腺磁共振成像(MRI)方面的潜力已得到证实。
本研究旨在评估增加和多样化训练数据(TD)对深度学习检测和分割具有临床意义的前列腺癌可疑病变性能的影响。
在这项回顾性研究中,使用多台1.5-T和3.0-T MRI扫描仪为连续男性采集的双参数(T2加权和扩散加权)前列腺MRI,用于训练和测试前列腺分割及病变检测网络。金标准是靶向和扩展系统性MRI-经直肠超声融合活检的组合,将国际泌尿病理学会分级组大于或等于2的前列腺癌定义为具有临床意义的前列腺癌。U-Net在完整、简化和PROSTATEx增强训练集上进行内部验证,随后在机构测试集和PROSTATEx测试集上进行外部验证。在交叉验证中将U-Net分割校准到临床期望水平,随后使用敏感性、特异性、预测值和Dice系数比较测试性能。
1488例机构检查(中位年龄64岁;四分位间距58 - 70岁)在时间上分为训练集(2014 - 2017年, 806例检查,补充204例PROSTATEx检查)和测试集(2018 - 2020年, 682例检查)。在测试集中,基于患者的前列腺影像报告和数据系统(PI-RADS)临界值大于或等于3及大于或等于4时,敏感性分别为97%(241/249)和90%(223/249),特异性分别为19%(82/433)和56%(242/433)。完整U-Net的相应敏感性为97%(241/249)和88%(219/249),特异性为20%(86/433)和59%(254/433),与PI-RADS无统计学差异(所有比较P > 0.3)。使用171例连续检查的简化集训练的U-Net性能较差(P < 0.001)。PROSTATEx训练增强未改善性能。在PI-RADS 3/4类等效物中,前列腺的Dice系数为0.90,MRI病变分割的Dice系数为0.42/0.53。
在一个大型机构测试集中,U-Net证实了与临床PI-RADS评估相似的性能,且受益于更多的TD,添加多扫描仪或双机构TD并未改善机构或PROSTATEx的性能。