Debs Noëlie, Routier Alexandre, Bône Alexandre, Rohé Marc-Miche
Guerbet Research, Paris, France.
Eur Radiol. 2025 Jun;35(6):3134-3143. doi: 10.1007/s00330-024-11287-1. Epub 2024 Dec 19.
This study aims to evaluate a deep learning pipeline for detecting clinically significant prostate cancer (csPCa), defined as Gleason Grade Group (GGG) ≥ 2, using biparametric MRI (bpMRI) and compare its performance with radiological reading.
The training dataset included 4381 bpMRI cases (3800 positive and 581 negative) across three continents, with 80% annotated using PI-RADS and 20% with Gleason Scores. The testing set comprised 328 cases from the PROSTATEx dataset, including 34% positive (GGG ≥ 2) and 66% negative cases. A 3D nnU-Net was trained on bpMRI for lesion detection, evaluated using histopathology-based annotations, and assessed with patient- and lesion-level metrics, along with lesion volume, and GGG. The algorithm was compared to non-expert radiologists using multi-parametric MRI (mpMRI).
The model achieved an AUC of 0.83 (95% CI: 0.80, 0.87). Lesion-level sensitivity was 0.85 (95% CI: 0.82, 0.94) at 0.5 False Positives per volume (FP/volume) and 0.88 (95% CI: 0.79, 0.92) at 1 FP/volume. Average Precision was 0.55 (95% CI: 0.46, 0.64). The model showed over 0.90 sensitivity for lesions larger than 650 mm³ and exceeded 0.85 across GGGs. It had higher true positive rates (TPRs) than radiologists equivalent FP rates, achieving TPRs of 0.93 and 0.79 compared to radiologists' 0.87 and 0.68 for PI-RADS ≥ 3 and PI-RADS ≥ 4 lesions (p ≤ 0.05).
The DL model showed strong performance in detecting csPCa on an independent test cohort, surpassing radiological interpretation and demonstrating AI's potential to improve diagnostic accuracy for non-expert radiologists. However, detecting small lesions remains challenging.
Question Current prostate cancer detection methods often do not involve non-expert radiologists, highlighting the need for more accurate deep learning approaches using biparametric MRI. Findings Our model outperforms radiologists significantly, showing consistent performance across Gleason Grade Groups and for medium to large lesions. Clinical relevance This AI model improves prostate detection accuracy in prostate imaging, serves as a benchmark with reference performance on a public dataset, and offers public PI-RADS annotations, enhancing transparency and facilitating further research and development.
本研究旨在评估一种深度学习流程,用于使用双参数磁共振成像(bpMRI)检测临床显著前列腺癌(csPCa),定义为 Gleason 分级组(GGG)≥2,并将其性能与放射科阅片进行比较。
训练数据集包括来自三大洲的 4381 例 bpMRI 病例(3800 例阳性和 581 例阴性),其中 80%使用前列腺影像报告和数据系统(PI-RADS)标注,20%使用 Gleason 评分标注。测试集包括来自 PROSTATEx 数据集的 328 例病例,其中 34%为阳性(GGG≥2),66%为阴性病例。使用 bpMRI 训练一个 3D nnU-Net 用于病变检测,使用基于组织病理学的标注进行评估,并使用患者和病变水平的指标以及病变体积和 GGG 进行评估。该算法与使用多参数磁共振成像(mpMRI)的非专家放射科医生进行比较。
该模型的曲线下面积(AUC)为 0.83(95%置信区间:0.80,0.87)。在每体积 0.5 个假阳性(FP/体积)时,病变水平的敏感性为 0.85(95%置信区间:0.82,0.94),在 1 FP/体积时为 0.88(95%置信区间:0.79,0.92)。平均精度为 0.55(95%置信区间:0.46,0.64)。该模型对大于 650mm³ 的病变显示出超过 0.90 的敏感性,并且在各 GGG 组中均超过 0.85。与放射科医生在等效假阳性率下相比,它具有更高的真阳性率(TPR),对于 PI-RADS≥3 和 PI-RADS≥4 的病变,TPR 分别为 0.93 和 0.79,而放射科医生的 TPR 分别为 0.87 和 0.68(p≤0.05)。
深度学习模型在独立测试队列中检测 csPCa 表现出强大性能,超越了放射学解读,证明了人工智能在提高非专家放射科医生诊断准确性方面的潜力。然而,检测小病变仍然具有挑战性。
问题 当前前列腺癌检测方法通常不涉及非专家放射科医生,凸显了使用双参数磁共振成像的更准确深度学习方法的必要性。发现 我们的模型显著优于放射科医生,在 Gleason 分级组以及中到大病变中表现出一致性能。临床意义 这种人工智能模型提高了前列腺成像中前列腺检测的准确性,在公共数据集上作为具有参考性能的基准,并提供公共的 PI-RADS 标注,增强了透明度并促进了进一步的研发。