Avidan Yuval, Tabachnikov Vsevolod, Court Orel Ben, Khoury Razi, Aker Amir
Department of Cardiology, Lady Davis Carmel Medical Center, Haifa, Israel; The Ruth and Bruce Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel.
Department of Cardiology, Lady Davis Carmel Medical Center, Haifa, Israel; The Ruth and Bruce Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel.
J Electrocardiol. 2025 Jan-Feb;88:153851. doi: 10.1016/j.jelectrocard.2024.153851. Epub 2024 Dec 7.
Atrial fibrillation (AF) is the most common arrhythmia in clinical practice, yet interpretation concerns among healthcare providers persist. Confounding factors contribute to false-positive and false-negative AF diagnoses, leading to potential omissions. Artificial intelligence advancements show promise in electrocardiogram (ECG) interpretation. We sought to examine the diagnostic accuracy of ChatGPT-4omni (GPT-4o), equipped with image evaluation capabilities, in interpreting ECGs with confounding factors and compare its performance to that of physicians.
Twenty ECG cases, divided into Group A (10 cases of AF or atrial flutter) and Group B (10 cases of sinus or another atrial rhythm), were crafted into multiple-choice questions. Total of 100 practitioners (25 from each: emergency medicine, internal medicine, primary care, and cardiology) were tasked to identify the underlying rhythm. Next, GPT-4o was prompted in five separate sessions.
GPT-4o performed inadequately, averaging 3 (±2) in Group A questions and 5.40 (±1.34) in Group B questions. Upon examining the accuracy of the total ECG questions, no significant difference was found between GPT-4o, internists, and primary care physicians (p = 0.952 and = 0.852, respectively). Cardiologists outperformed other medical disciplines and GPT-4o (p < 0.001), while emergency physicians followed in accuracy, though comparison to GPT-4o only indicated a trend (p = 0.068).
GPT-4o demonstrated suboptimal accuracy with significant under- and over-recognition of AF in ECGs with confounding factors. Despite its potential as a supportive tool for ECG interpretation, its performance did not surpass that of medical practitioners, underscoring the continued importance of human expertise in complex diagnostics.
心房颤动(AF)是临床实践中最常见的心律失常,但医疗服务提供者对其解读仍存在担忧。混杂因素会导致AF诊断出现假阳性和假阴性,从而可能导致漏诊。人工智能的进步在心电图(ECG)解读方面显示出前景。我们试图研究具备图像评估能力的ChatGPT-4omni(GPT-4o)在解读存在混杂因素的ECG时的诊断准确性,并将其表现与医生的表现进行比较。
将20例ECG病例分为A组(10例AF或心房扑动)和B组(10例窦性或其他心房节律),并制作成多项选择题。共有100名从业者(急诊医学、内科、初级保健和心脏病学各25名)负责识别潜在节律。接下来,在五个不同的环节中对GPT-4o进行提问。
GPT-4o表现不佳,在A组问题中的平均得分为3(±2),在B组问题中的平均得分为5.40(±1.34)。在检查所有ECG问题的准确性时,GPT-4o、内科医生和初级保健医生之间未发现显著差异(p分别为0.952和0.852)。心脏病专家的表现优于其他医学学科和GPT-4o(p<0.001),而急诊医生的准确性次之,不过与GPT-4o相比仅显示出一种趋势(p=0.068)。
GPT-4o在解读存在混杂因素的ECG时,准确性欠佳,对AF存在明显的漏诊和误诊。尽管它有潜力作为ECG解读的辅助工具,但其表现并未超过医学从业者,这凸显了人类专业知识在复杂诊断中持续的重要性。