Suppr超能文献

先进人工智能算法在急性缺血性卒中诊断效能的评估:ChatGPT-4o与Claude 3.5 Sonnet模型的比较分析

Evaluation of Advanced Artificial Intelligence Algorithms' Diagnostic Efficacy in Acute Ischemic Stroke: A Comparative Analysis of ChatGPT-4o and Claude 3.5 Sonnet Models.

作者信息

Koyun Mustafa, Taskent Ismail

机构信息

Department of Radiology, Kastamonu Training and Research Hospital, Kastamonu 37150, Turkey.

Department of Radiology, Kastamonu University, Kastamonu 37150, Turkey.

出版信息

J Clin Med. 2025 Jan 17;14(2):571. doi: 10.3390/jcm14020571.

Abstract

Acute ischemic stroke (AIS) is a leading cause of mortality and disability worldwide, with early and accurate diagnosis being critical for timely intervention and improved patient outcomes. This retrospective study aimed to assess the diagnostic performance of two advanced artificial intelligence (AI) models, Chat Generative Pre-trained Transformer (ChatGPT-4o) and Claude 3.5 Sonnet, in identifying AIS from diffusion-weighted imaging (DWI). The DWI images of a total of 110 cases (AIS group: = 55, healthy controls: = 55) were provided to the AI models via standardized prompts. The models' responses were compared to radiologists' gold-standard evaluations, and performance metrics such as sensitivity, specificity, and diagnostic accuracy were calculated. Both models exhibited a high sensitivity for AIS detection (ChatGPT-4o: 100%, Claude 3.5 Sonnet: 94.5%). However, ChatGPT-4o demonstrated a significantly lower specificity (3.6%) compared to Claude 3.5 Sonnet (74.5%). The agreement with radiologists was poor for ChatGPT-4o (κ = 0.036; %95 CI: -0.013, 0.085) but good for Claude 3.5 Sonnet (κ = 0.691; %95 CI: 0.558, 0.824). In terms of the AIS hemispheric localization accuracy, Claude 3.5 Sonnet (67.2%) outperformed ChatGPT-4o (32.7%). Similarly, for specific AIS localization, Claude 3.5 Sonnet (30.9%) showed greater accuracy than ChatGPT-4o (7.3%), with these differences being statistically significant ( < 0.05). This study highlights the superior diagnostic performance of Claude 3.5 Sonnet compared to ChatGPT-4o in identifying AIS from DWI. Despite its advantages, both models demonstrated notable limitations in accuracy, emphasizing the need for further development before achieving full clinical applicability. These findings underline the potential of AI tools in radiological diagnostics while acknowledging their current limitations.

摘要

急性缺血性卒中(AIS)是全球范围内导致死亡和残疾的主要原因,早期准确诊断对于及时干预和改善患者预后至关重要。这项回顾性研究旨在评估两种先进的人工智能(AI)模型,即聊天生成预训练变换器(ChatGPT-4o)和Claude 3.5十四行诗,从扩散加权成像(DWI)中识别AIS的诊断性能。通过标准化提示将总共110例患者的DWI图像(AIS组:n = 55,健康对照组:n = 55)提供给AI模型。将模型的反应与放射科医生的金标准评估进行比较,并计算敏感性、特异性和诊断准确性等性能指标。两种模型对AIS检测均表现出高敏感性(ChatGPT-4o:100%,Claude 3.5十四行诗:94.5%)。然而,与Claude 3.5十四行诗(74.5%)相比,ChatGPT-4o的特异性显著更低(3.6%)。ChatGPT-4o与放射科医生的一致性较差(κ = 0.036;95%CI:-0.013,0.085),而Claude 3.5十四行诗与放射科医生的一致性良好(κ = 0.691;95%CI:0.558,0.824)。在AIS半球定位准确性方面,Claude 3.5十四行诗(67.2%)优于ChatGPT-4o(32.7%)。同样,对于特定的AIS定位,Claude 3.5十四行诗(30.9%)比ChatGPT-4o(7.3%)显示出更高的准确性,这些差异具有统计学意义(P < 0.05)。这项研究突出了Claude 3.5十四行诗在从DWI中识别AIS方面比ChatGPT-4o具有更好的诊断性能。尽管有其优势,但两种模型在准确性方面都表现出明显的局限性,强调在实现完全临床适用性之前需要进一步发展。这些发现强调了AI工具在放射诊断中的潜力,同时也认识到它们目前的局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75a/11765597/44894faf3c10/jcm-14-00571-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验