Molecular Imaging Branch, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA (M.J.B., E.C.Y., Y.L., L.J., K.M.M, N.S.L., P.L.C., S.A.H., B.T.).
Department of Radiology, Singapore General Hospital, Singapore (Y.M.L.).
Acad Radiol. 2024 Apr;31(4):1429-1437. doi: 10.1016/j.acra.2023.09.030. Epub 2023 Oct 17.
Prostate MRI quality is essential in guiding prostate biopsies. However, assessment of MRI quality is subjective with variation. Quality degradation sources exert varying impacts based on the sequence under consideration, such as T2W versus DWI. As a result, employing sequence-specific techniques for quality assessment could yield more advantageous outcomes. This study aims to develop an AI tool that offers a more consistent evaluation of T2W prostate MRI quality, efficiently identifying suboptimal scans while minimizing user bias.
This retrospective study included 1046 patients from three cohorts (ProstateX [n = 347], All-comer in-house [n = 602], enriched bad-quality MRI in-house [n = 97]) scanned between January 2011 and May 2022. An expert reader assigned T2W MRIs a quality score. A train-validation-test split of 70:15:15 was applied, ensuring equal distribution of MRI scanners and protocols across all partitions. T2W quality AI classification model was based on 3D DenseNet121 architecture using MONAI framework. In addition to multiclassification, binary classification was utilized (Classes 0/1 vs. 2). A score of 0 was given to scans considered non-diagnostic or unusable, a score of 1 was given to those with acceptable diagnostic quality with some usability but with some quality distortions present, and a score of 2 was given to those considered optimal diagnostic quality and usability. Partial occlusion sensitivity maps were generated for anatomical correlation. Three body radiologists assessed reproducibility within a subgroup of 60 test cases using weighted Cohen Kappa.
The best validation multiclass accuracy of 77.1% (121/157) was achieved during training. In the test dataset, multiclassification accuracy was 73.9% (116/157), whereas binary accuracy was 84.7% (133/157). Sub-class sensitivity for binary quality distortion classification for class 0 was 100% (18/18), and sub-class specificity for T2W classification of absence/minimal quality distortions for class 2 was 90.5% (95/105). All three readers showed moderate to substantial agreement with ground truth (R1-R3 κ = 0.588, κ = 0.649, κ = 0.487, respectively), moderate to substantial agreement with each other (R1-R2 κ = 0.599, R1-R3 κ = 0.612, R2-R3 κ = 0.685), fair to moderate agreement with AI (R1-R3 κ = 0.445, κ = 0.410, κ = 0.292, respectively). AI showed substantial agreement with ground truth (κ = 0.704). 3D quality heatmap evaluation revealed that the most critical non-diagnostic quality imaging features from an AI perspective related to obscuration of the rectoprostatic space (94.4%, 17/18).
The 3D AI model can assess T2W prostate MRI quality with moderate accuracy and translate whole sequence-level classification labels into 3D voxel-level quality heatmaps for interpretation. Image quality has a significant downstream impact on ruling out clinically significant cancers. AI may be able to help with reproducible identification of MRI sequences requiring re-acquisition with explainability.
前列腺 MRI 质量对于指导前列腺活检至关重要。然而,MRI 质量的评估具有主观性且存在差异。质量降级的来源根据所考虑的序列而有所不同,例如 T2W 与 DWI。因此,采用针对特定序列的技术进行质量评估可能会产生更有利的结果。本研究旨在开发一种人工智能工具,该工具可以更一致地评估 T2W 前列腺 MRI 质量,高效地识别出不理想的扫描,同时最大程度地减少用户偏见。
这是一项回顾性研究,纳入了来自三个队列的 1046 名患者(ProstateX [n=347]、所有患者内部队列 [n=602]、内部增强不良质量 MRI 队列 [n=97]),扫描时间为 2011 年 1 月至 2022 年 5 月。一位专家读者为 T2W MRI 分配质量评分。采用 70:15:15 的训练-验证-测试分割,确保所有分区的 MRI 扫描仪和协议均衡分布。T2W 质量 AI 分类模型基于使用 MONAI 框架的 3D DenseNet121 架构。除了多分类,还进行了二进制分类(类别 0/1 与 2)。将诊断质量差或无法使用的扫描评为 0 分,将具有可接受的诊断质量但存在一些可用性问题的扫描评为 1 分,将认为是最佳诊断质量和可用性的扫描评为 2 分。为了进行解剖学相关性的局部遮挡敏感性图生成,生成了 3D 质量热图。三位体部放射科医生在一个包含 60 个测试案例的子组中使用加权 Cohen Kappa 评估了可重复性。
在训练过程中,达到了最佳验证多分类准确率 77.1%(121/157)。在测试数据集上,多分类准确率为 73.9%(116/157),而二进制准确率为 84.7%(133/157)。对于类 0 的二进制质量失真分类,类灵敏度为 100%(18/18),对于类 2 的 T2W 分类不存在/最小质量失真,类特异性为 90.5%(95/105)。三位读者与真实情况均显示出中度到高度一致性(R1-R3 κ=0.588、κ=0.649、κ=0.487),相互之间也显示出中度到高度一致性(R1-R2 κ=0.599、R1-R3 κ=0.612、R2-R3 κ=0.685),与 AI 之间显示出适度到中度一致性(R1-R3 κ=0.445、κ=0.410、κ=0.292)。AI 与真实情况显示出高度一致性(κ=0.704)。3D 质量热图评估显示,从 AI 的角度来看,最关键的不可诊断质量成像特征与直肠前列腺空间的遮挡有关(94.4%,17/18)。
3D AI 模型可以以中等准确性评估 T2W 前列腺 MRI 质量,并将整个序列级分类标签转换为 3D 体素级质量热图以供解释。图像质量对排除临床上显著的癌症有重大影响。AI 可能有助于识别需要重新采集的 MRI 序列,同时具有可解释性。