Department of Radiology, Ministry of Health Ankara 29 Mayis State Hospital, Ankara, Türkiye.
Department of Radiology, Ankara Mamak State Hospital, Ankara, Türkiye.
Clin Imaging. 2024 Oct;114:110271. doi: 10.1016/j.clinimag.2024.110271. Epub 2024 Aug 31.
The advent of large language models (LLMs) marks a transformative leap in natural language processing, offering unprecedented potential in radiology, particularly in enhancing the accuracy and efficiency of coronary artery disease (CAD) diagnosis. While previous studies have explored the capabilities of specific LLMs like ChatGPT in cardiac imaging, a comprehensive evaluation comparing multiple LLMs in the context of CAD-RADS 2.0 has been lacking. This study addresses this gap by assessing the performance of various LLMs, including ChatGPT 4, ChatGPT 4o, Claude 3 Opus, Gemini 1.5 Pro, Mistral Large, Meta Llama 3 70B, and Perplexity Pro, in answering 30 multiple-choice questions derived from the CAD-RADS 2.0 guidelines. Our findings reveal that ChatGPT 4o achieved the highest accuracy at 100 %, with ChatGPT 4 and Claude 3 Opus closely following at 96.6 %. Other models, including Mistral Large, Perplexity Pro, Meta Llama 3 70B, and Gemini 1.5 Pro, also demonstrated commendable performance, though with slightly lower accuracy ranging from 90 % to 93.3 %. This study underscores the proficiency of current LLMs in understanding and applying CAD-RADS 2.0, suggesting their potential to significantly enhance radiological reporting and patient care in coronary artery disease. The variations in model performance highlight the need for further research, particularly in evaluating the visual diagnostic capabilities of LLMs-a critical component of radiology practice. This study provides a foundational comparison of LLMs in CAD-RADS 2.0 and sets the stage for future investigations into their broader applications in radiology, emphasizing the importance of integrating both text-based and visual knowledge for optimal clinical outcomes.
大型语言模型 (LLM) 的出现标志着自然语言处理领域的重大突破,为放射学带来了前所未有的潜力,特别是在提高冠状动脉疾病 (CAD) 诊断的准确性和效率方面。虽然之前的研究已经探索了 ChatGPT 等特定 LLM 在心脏成像方面的能力,但缺乏对 CAD-RADS 2.0 背景下多种 LLM 进行全面评估的研究。本研究通过评估各种 LLM 的性能来填补这一空白,包括 ChatGPT 4、ChatGPT 4o、Claude 3 Opus、Gemini 1.5 Pro、Mistral Large、Meta Llama 3 70B 和 Perplexity Pro,以回答源自 CAD-RADS 2.0 指南的 30 个多项选择题。我们的研究结果表明,ChatGPT 4o 的准确率最高,达到 100%,紧随其后的是 ChatGPT 4 和 Claude 3 Opus,准确率为 96.6%。其他模型,包括 Mistral Large、Perplexity Pro、Meta Llama 3 70B 和 Gemini 1.5 Pro,也表现出了令人称赞的性能,准确率略低,范围在 90%到 93.3%之间。本研究强调了当前 LLM 在理解和应用 CAD-RADS 2.0 方面的能力,表明它们有可能显著增强冠状动脉疾病的放射学报告和患者护理。模型性能的差异突出表明需要进一步研究,特别是评估 LLM 的视觉诊断能力——这是放射学实践的关键组成部分。本研究为 CAD-RADS 2.0 中的 LLM 提供了基础比较,并为未来更广泛地研究它们在放射学中的应用奠定了基础,强调了整合基于文本和基于视觉的知识以实现最佳临床结果的重要性。