Papiashvili Nikoloz, Abshilava Christina, Malik Mohammad H, Dzindzibadze Tinatin, Anderson Emeli J, Gagua Sopio, Guruli Vladimir, Amarasinghe Kaveesha, Gonjilashvili Nana, Tchokhonelidze Irma
Faculty of Medicine, Tbilisi State Medical University, Tbilisi, GEO.
Department of Nephrology, Ingorokva High Medical Technology University Clinic, Tbilisi, GEO.
Cureus. 2025 Jul 12;17(7):e87761. doi: 10.7759/cureus.87761. eCollection 2025 Jul.
Introduction The implementation of artificial intelligence (AI) in radiology as a medical decision support system has the potential to enhance diagnostic accuracy and improve patient outcomes. This retrospective study aimed to evaluate the diagnostic capabilities of GPT-4o in interpreting radiological imaging, specifically X-ray, CT, and MRI images, across various organ systems and disease types. Methods A total of 377 cases were collected and presented to GPT-4o with a standardized prompt and no clinical context. The responses were assessed by three independent raters using a five-point rating system. Results X-ray imaging exhibited a 2.21 times higher chance, on average, of being interpreted accurately compared to CT scans (odds ratio (OR): 2.21; 95% confidence interval (CI): 1.33 - 3.69), while pelvic imaging had a 6.25 times lower chance, on average, of being interpreted accurately when compared to images of the abdomen (OR: 0.16; 95% CI: 0.02 - 0.56). Additionally, neoplastic conditions had a 2.7 times lower chance, on average, of being interpreted accurately compared to bleeding conditions (OR: 0.37; 95% CI: 0.16 - 0.84). Conclusion A bimodal distribution of median ratings highlights an overreliance on comparability to prior image encounters and emphasizes the need to develop a systematic approach to image analysis. Future research should prioritize eliminating hallucination, establishing standardized evaluation criteria, and exploring methods to integrate visual and text-based data in a balanced manner. Additionally, efforts should be directed towards enhancing dataset diversity to improve the model's overall accuracy and generalizability.
引言 作为一种医学决策支持系统,人工智能(AI)在放射学中的应用有提高诊断准确性和改善患者预后的潜力。这项回顾性研究旨在评估GPT-4o在解读放射影像(特别是X线、CT和MRI图像)方面对各种器官系统和疾病类型的诊断能力。方法 共收集了377例病例,并以标准化提示语呈现给GPT-4o,且不提供临床背景信息。由三名独立评估者使用五分制评分系统对回复进行评估。结果 与CT扫描相比,X线成像平均被准确解读的可能性高2.21倍(优势比(OR):2.21;95%置信区间(CI):1.33 - 3.69),而盆腔成像与腹部图像相比,平均被准确解读的可能性低6.25倍(OR:0.16;95%CI:0.02 - 0.56)。此外,与出血性疾病相比,肿瘤性疾病平均被准确解读的可能性低2.7倍(OR:0.37;95%CI:0.16 - 0.84)。结论 中位数评分的双峰分布突出了对与先前图像对比的过度依赖,并强调需要开发一种系统的图像分析方法。未来的研究应优先消除幻觉、建立标准化评估标准,并探索以平衡的方式整合视觉和基于文本的数据的方法。此外,应致力于提高数据集的多样性,以提高模型的整体准确性和通用性。