University Medical Centre Groningen, Hanzeplein 1, 9713 GZ, Groningen, The Netherlands.
University of Twente, Drienerlolaan 5, 7522 NB, Enschede, The Netherlands.
Eur Radiol. 2024 Nov;34(11):7364-7372. doi: 10.1007/s00330-024-10771-y. Epub 2024 May 9.
Deep learning (DL) MRI reconstruction enables fast scan acquisition with good visual quality, but the diagnostic impact is often not assessed because of large reader study requirements. This study used existing diagnostic DL to assess the diagnostic quality of reconstructed images.
A retrospective multisite study of 1535 patients assessed biparametric prostate MRI between 2016 and 2020. Likely clinically significant prostate cancer (csPCa) lesions (PI-RADS 4) were delineated by expert radiologists. T2-weighted scans were retrospectively undersampled, simulating accelerated protocols. DL reconstruction (DLRecon) and diagnostic DL detection (DLDetect) were developed. The effect on the partial area under (pAUC), the Free-Response Operating Characteristic (FROC) curve, and the structural similarity (SSIM) were compared as metrics for diagnostic and visual quality, respectively. DLDetect was validated with a reader concordance analysis. Statistical analysis included Wilcoxon, permutation, and Cohen's kappa tests for visual quality, diagnostic performance, and reader concordance.
DLRecon improved visual quality at 4- and 8-fold (R4, R8) subsampling rates, with SSIM (range: -1 to 1) improved to 0.78 ± 0.02 (p < 0.001) and 0.67 ± 0.03 (p < 0.001) from 0.68 ± 0.03 and 0.51 ± 0.03, respectively. However, diagnostic performance at R4 showed a pAUC FROC of 1.33 (CI 1.28-1.39) for DL and 1.29 (CI 1.23-1.35) for naive reconstructions, both significantly lower than fully sampled pAUC of 1.58 (DL: p = 0.024, naïve: p = 0.02). Similar trends were noted for R8.
DL reconstruction produces visually appealing images but may reduce diagnostic accuracy. Incorporating diagnostic AI into the assessment framework offers a clinically relevant metric essential for adopting reconstruction models into clinical practice.
In clinical settings, caution is warranted when using DL reconstruction for MRI scans. While it recovered visual quality, it failed to match the prostate cancer detection rates observed in scans not subjected to acceleration and DL reconstruction.
深度学习(DL)MRI 重建可以实现快速扫描采集并具有良好的视觉质量,但由于需要大量的读者研究,其诊断效果往往未得到评估。本研究使用现有的诊断性 DL 来评估重建图像的诊断质量。
回顾性多中心研究纳入了 2016 年至 2020 年间进行的 1535 例前列腺双参数 MRI 检查患者。由专家放射科医生对可能具有临床意义的前列腺癌(csPCa)病变(PI-RADS 4)进行勾画。对 T2 加权扫描进行回顾性欠采样,模拟加速协议。开发了深度学习重建(DLRecon)和诊断性深度学习检测(DLDetect)。分别使用部分面积下的曲线下面积(pAUC)、自由反应特征(FROC)曲线和结构相似性(SSIM)作为诊断和视觉质量的度量指标来比较其效果。通过读者一致性分析对 DLDetect 进行验证。视觉质量、诊断性能和读者一致性的统计分析包括 Wilcoxon、置换和 Cohen's kappa 检验。
在 4 倍和 8 倍(R4、R8)的子采样率下,DLRecon 提高了视觉质量,SSIM(范围:-1 到 1)从 0.68±0.03 分别改善至 0.78±0.02(p<0.001)和 0.67±0.03(p<0.001)。然而,在 R4 下的诊断性能中,DL 的 FROC 曲线下面积为 1.33(CI 1.28-1.39),而原始重建的 FROC 曲线下面积为 1.29(CI 1.23-1.35),均显著低于完全采样的 1.58(DL:p=0.024,原始:p=0.02)。在 R8 下也观察到了类似的趋势。
DL 重建生成的图像具有吸引力,但可能会降低诊断准确性。将诊断性 AI 纳入评估框架提供了一个临床相关的指标,对于将重建模型应用于临床实践至关重要。
在临床环境中,使用 DL 重建进行 MRI 扫描时需要谨慎。虽然它恢复了视觉质量,但未能达到未加速和未进行 DL 重建的扫描中观察到的前列腺癌检出率。