Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada.
Joint Department of Medical Imaging, University Health Network, Sinai Health System and University of Toronto, Toronto, ON, Canada.
Eur Radiol. 2020 Dec;30(12):6867-6876. doi: 10.1007/s00330-020-07030-1. Epub 2020 Jun 26.
To benchmark the performance of a calibrated 3D convolutional neural network (CNN) applied to multiparametric MRI (mpMRI) for risk assessment of clinically significant prostate cancer (csPCa) using decision curve analysis (DCA).
We retrospectively analyzed 499 patients who had positive mpMRI (PI-RADSv2 ≥ 3) and MRI-targeted biopsy. The training cohort comprised 449 men, including a calibration set of 50 men. Biopsy decision strategies included using risk estimates from the CNN (original and calibrated), to perform biopsy in men with PI-RADSv2 ≥ 4 only, or additionally in men with PI-RADSv2 3 and PSA density (PSAd) ≥ 0.15 ng/ml/ml. Discrimination, calibration and clinical usefulness in the unseen test cohort (n = 50) were assessed using C-statistic, calibration plots and DCA, respectively.
The calibrated CNN achieved moderate calibration (Hosmer-Lemeshow calibration test, p = 0.41) and good discrimination (C = 0.85). DCA revealed consistently higher net benefit and net reduction in biopsies for the calibrated CNN compared with the original CNN, PI-RADSv2 ≥ 4 and the combined strategy of PI-RADSv2 and PSAd. Original CNN predictions were severely miscalibrated (p < 0.0001) resulting in net harm compared with a 'biopsy all' patients strategy. At-risk thresholds ≥ 10% using the calibrated CNN and the combined strategy reduced the number of biopsies by an estimated 201 and 55 men, respectively, per 1000 men at risk, without missing csPCa, while original CNN and PI-RADSv2 ≥ 4 could not achieve a net reduction in biopsies.
DCA revealed that our calibrated 3D-CNN resulted in fewer unnecessary biopsies compared with using PI-RADSv2 alone or in combination with PSAd. CNN calibration is important in achieving clinical utility.
• A 3D deep learning model applied to multiparametric MRI may help to prevent unnecessary prostate biopsies in patients eligible for MRI-targeted biopsy. • Owing to miscalibration, original risk estimates by the deep learning model require prior calibration to enable clinical utility. • Decision curve analysis confirmed a net benefit of using our calibrated deep learning model for biopsy decisions compared with alternative strategies, including PI-RADSv2 alone and in combination with prostate-specific antigen density.
使用决策曲线分析(DCA)对校准后的 3D 卷积神经网络(CNN)在多参数 MRI(mpMRI)用于临床显著前列腺癌(csPCa)风险评估中的性能进行基准测试。
我们回顾性分析了 499 例 mpMRI(PI-RADSv2≥3)阳性且行 MRI 靶向活检的患者。训练队列包括 449 例男性,其中 50 例为校准集。活检决策策略包括使用 CNN(原始和校准后的)的风险估计值,仅对 PI-RADSv2≥4 的男性进行活检,或者对 PI-RADSv2 为 3 且 PSA 密度(PSAd)≥0.15ng/ml/ml 的男性进行活检。使用 C 统计量、校准图和 DCA 分别评估未见过的测试队列(n=50)中的判别、校准和临床实用性。
校准后的 CNN 具有中度校准(Hosmer-Lemeshow 校准检验,p=0.41)和良好的判别能力(C=0.85)。DCA 显示,与原始 CNN、PI-RADSv2≥4 和 PI-RADSv2 与 PSAd 联合策略相比,校准后的 CNN 始终具有更高的净收益和活检净减少。原始 CNN 预测严重校准不良(p<0.0001),与“活检所有”患者策略相比,导致净损害。使用校准后的 CNN 和联合策略,风险阈值≥10%可使每 1000 名高危患者的活检数量分别减少约 201 次和 55 次,同时不会错过 csPCa,而原始 CNN 和 PI-RADSv2≥4 则无法减少活检数量。
DCA 表明,与单独使用 PI-RADSv2 或与 PSAd 联合使用相比,校准后的 3D-CNN 可减少不必要的活检。CNN 校准对于实现临床实用性非常重要。
应用于多参数 MRI 的深度学习模型可能有助于防止符合 MRI 靶向活检条件的患者进行不必要的前列腺活检。
由于校准不良,原始深度学习模型的风险估计需要进行校准,以实现临床实用性。
DCA 证实,与替代策略(包括单独使用 PI-RADSv2 和与 PSA 密度联合使用)相比,使用我们校准后的深度学习模型进行活检决策具有净收益。