Konovalova Natalia, Tolpadi Aniket, Liu Felix, Akkaya Zehra, Luitjens Johanna, Gassert Felix, Giesler Paula, Bhattacharjee Rupsa, Han Misung, Bahroos Emma, Majumdar Sharmila, Pedoia Valentina
Radiology and Biomedical Imaging Department, University of California, San Francisco, San Francisco, CA, United States.
Bioengineering Department, University of California, Berkeley, Berkeley, CA, United States.
Radiol Adv. 2025 Apr 4;2(2):umaf015. doi: 10.1093/radadv/umaf015. eCollection 2025 Mar.
Accurate interpretation of meniscal anomalies on knee MRI is critical for diagnosis and treatment planning, with artificial intelligence emerging as a promising tool to support and enhance this process through automated anomaly detection.
To evaluate the impact of an artificial intelligence (AI) anomaly detection assistant on radiologists' interpretation of meniscal anomalies in undersampled, deep learning (DL)-reconstructed knee MRI and assess the relationship between reconstruction quality metrics and anomaly detection performance.
This retrospective study included 947 knee MRI examinations; 51 were excluded for poor image quality, leaving 896 participants (mean age, 44.7 ± 15.3 years; 472 women). Using 8-fold undersampled data, DL-based reconstructed images were generated. An object detection model was trained on original, fully sampled images and evaluated on 1 original and 14 DL-reconstructed test sets to identify meniscal lesions. Standard reconstruction metrics (normalized root mean square error, peak signal-to-noise ratio, and structural similarity index) and anomaly detection metrics (mean average precision, F1 score) were quantified and compared. Two radiologists independently reviewed a stratified sample of 50 examinations unassisted and assisted with AI-predicted anomaly boxes. McNemar's test evaluated differences in diagnostic performance; Cohen's kappa assessed interrater agreement.
On the original images, the anomaly detection model achieved the following: 70.53% precision, 72.17% recall, 63.09% mAP, and a 71.34% F1 score. Comparing performance among the undersampled reconstruction datasets, box-based reconstruction metrics showed better correlation with detection performance than traditional image-based metrics (mAP to box-based SSIM, = 0.81, < .01; mAP to image-based SSIM, = 0.64, = .01). In 50 participants, AI assistance improved radiologists' accuracy on reconstructed images. Sensitivity increased from 77.27% (95% CI, 65.83-85.72; 51/66) to 80.30% (95% CI, 69.16-88.11; 53/66), and specificity improved from 88.46% (95% CI, 83.73-91.95; 207/234) to 90.60% (95% CI, 86.18-93.71; 212/234) ( < .05).
AI-assisted meniscal anomaly detection enhanced radiologists' interpretation of undersampled, DL-reconstructed knee MRI. Anomaly detection may serve as a complementary tool alongside other reconstruction metrics to assess the preservation of clinically important features in reconstructed images, warranting further investigation.
准确解读膝关节磁共振成像(MRI)上的半月板异常对于诊断和治疗方案规划至关重要,人工智能正成为一种有前景的工具,可通过自动异常检测来支持和加强这一过程。
评估人工智能(AI)异常检测助手对放射科医生解读欠采样、深度学习(DL)重建的膝关节MRI中半月板异常的影响,并评估重建质量指标与异常检测性能之间的关系。
这项回顾性研究纳入了947例膝关节MRI检查;51例因图像质量差被排除,最终纳入896名参与者(平均年龄44.7±15.3岁;472名女性)。使用8倍欠采样数据生成基于DL的重建图像。在原始的、全采样图像上训练一个目标检测模型,并在1个原始测试集和14个DL重建测试集上进行评估,以识别半月板损伤。对标准重建指标(归一化均方根误差、峰值信噪比和结构相似性指数)和异常检测指标(平均精度均值、F1分数)进行量化和比较。两名放射科医生独立回顾了50例检查的分层样本,分别在无辅助和有AI预测异常框辅助的情况下进行。McNemar检验评估诊断性能的差异;Cohen's kappa评估评分者间的一致性。
在原始图像上,异常检测模型的表现如下:精度为70.53%,召回率为72.17%,平均精度均值为63.09%,F1分数为71.34%。比较欠采样重建数据集之间的性能,基于框的重建指标与检测性能的相关性比传统的基于图像的指标更好(平均精度均值与基于框的结构相似性指数的相关性,r = 0.81,P <.01;平均精度均值与基于图像的结构相似性指数的相关性,r = 0.64,P =.01)。在50名参与者中,AI辅助提高了放射科医生对重建图像的解读准确性。敏感性从77.27%(95%CI,65.83 - 85.72;51/66)提高到80.30%(95%CI,69.16 - 88.11;53/66),特异性从88.46%(95%CI,83.73 - 91.95;207/234)提高到90.60%(95%CI,86.18 - 93.71;212/234)(P <.05)。
AI辅助的半月板异常检测增强了放射科医生对欠采样、DL重建的膝关节MRI的解读。异常检测可作为一种补充工具,与其他重建指标一起评估重建图像中临床重要特征的保留情况,值得进一步研究。