Nakai Hirotsugu, Froemming Adam T, Kawashima Akira, LeGout Jordan D, Kurata Yasuhisa, Gloe Jacob N, Borisch Eric A, Riederer Stephen J, Takahashi Naoki
Department of Radiology, Mayo Clinic, Rochester, USA.
Department of Radiology, Mayo Clinic, Scottsdale, AZ, USA.
Abdom Radiol (NY). 2025 Aug 25. doi: 10.1007/s00261-025-05163-9.
To determine whether deep learning (DL)-based image quality (IQ) assessment of T2-weighted images (T2WI) could be biased by the presence of clinically significant prostate cancer (csPCa).
In this three-center retrospective study, five abdominal radiologists categorized IQ of 2,105 transverse T2WI series into optimal, mild, moderate, and severe degradation. An IQ classification model was developed using 1,719 series (development set). The agreement between the model and radiologists was assessed using the remaining 386 series with a quadratic weighted kappa. The model was applied to 11,723 examinations that were not included in the development set and without documented prostate cancer at the time of MRI (patient age, 65.5 ± 8.3 years [mean ± standard deviation]). Examinations categorized as mild to severe degradation were used as target groups, whereas those as optimal were used to construct matched control groups. Case-control matching was performed to mitigate the effects of pre-MRI confounding factors, such as age and prostate-specific antigen value. The proportion of patients with csPCa was compared between the target and matched control groups using the chi-squared test.
The agreement between the model and radiologists was moderate with a quadratic weighted kappa of 0.53. The mild-moderate IQ-degraded groups had significantly higher csPCa proportions than the matched control groups with optimal IQ: moderate (N = 126) vs. optimal (N = 504), 26.3% vs. 22.7%, respectively, difference = 3.6% [95% confidence interval: 0.4%, 6.8%], p = 0.03; mild (N = 1,399) vs. optimal (N = 1,399), 22.9% vs. 20.2%, respectively, difference = 2.7% [0.7%, 4.7%], p = 0.008.
The DL-based IQ tended to be worse in patients with csPCa, raising concerns about its clinical application.
确定基于深度学习(DL)的T2加权图像(T2WI)图像质量(IQ)评估是否会因临床显著性前列腺癌(csPCa)的存在而产生偏差。
在这项三中心回顾性研究中,五名腹部放射科医生将2105个横向T2WI序列的IQ分为最佳、轻度、中度和严重退化。使用1719个序列(开发集)建立了IQ分类模型。使用剩余的386个序列通过二次加权kappa评估模型与放射科医生之间的一致性。该模型应用于11723例未纳入开发集且在MRI检查时无前列腺癌记录的检查(患者年龄,65.5±8.3岁[平均值±标准差])。分类为轻度至重度退化的检查用作目标组,而分类为最佳的检查用作构建匹配对照组。进行病例对照匹配以减轻MRI前混杂因素(如年龄和前列腺特异性抗原值)的影响。使用卡方检验比较目标组和匹配对照组中csPCa患者的比例。
模型与放射科医生之间的一致性为中等,二次加权kappa为0.53。轻度至中度IQ退化组的csPCa比例明显高于IQ最佳的匹配对照组:中度(N = 126)与最佳(N = 504),分别为26.3%与22.7%,差异 = 3.6% [95%置信区间:0.4%,6.8%],p = 0.03;轻度(N = 1399)与最佳(N = 1399),分别为22.9%与20.2%,差异 = 2.7% [0.7%,4.7%],p = 0.008。
基于DL的IQ在csPCa患者中往往较差,这引发了对其临床应用的担忧。