Gemini、GPT-4 和 GPT-4o 在心电图分析中的准确性：与心脏病专家和急诊医学专家的比较。

The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists.

机构信息

Emergency Medicine, Department of Emergency Medicine, Hitit University Çorum Erol Olçok Education and Research Hospital, Çorum, Turkey.

出版信息

Am J Emerg Med. 2024 Oct;84:68-73. doi: 10.1016/j.ajem.2024.07.043. Epub 2024 Jul 30.

DOI:10.1016/j.ajem.2024.07.043

PMID:39096711

Abstract

INTRODUCTION

GPT-4, GPT-4o and Gemini advanced, which are among the well-known large language models (LLMs), have the capability to recognize and interpret visual data. When the literature is examined, there are a very limited number of studies examining the ECG performance of GPT-4. However, there is no study in the literature examining the success of Gemini and GPT-4o in ECG evaluation. The aim of our study is to evaluate the performance of GPT-4, GPT-4o, and Gemini in ECG evaluation, assess their usability in the medical field, and compare their accuracy rates in ECG interpretation with those of cardiologists and emergency medicine specialists.

METHODS

The study was conducted from May 14, 2024, to June 3, 2024. The book "150 ECG Cases" served as a reference, containing two sections: daily routine ECGs and more challenging ECGs. For this study, two emergency medicine specialists selected 20 ECG cases from each section, totaling 40 cases. In the next stage, the questions were evaluated by emergency medicine specialists and cardiologists. In the subsequent phase, a diagnostic question was entered daily into GPT-4, GPT-4o, and Gemini Advanced on separate chat interfaces. In the final phase, the responses provided by cardiologists, emergency medicine specialists, GPT-4, GPT-4o, and Gemini Advanced were statistically evaluated across three categories: routine daily ECGs, more challenging ECGs, and the total number of ECGs.

RESULTS

Cardiologists outperformed GPT-4, GPT-4o, and Gemini Advanced in all three groups. Emergency medicine specialists performed better than GPT-4o in routine daily ECG questions and total ECG questions (p = 0.003 and p = 0.042, respectively). When comparing GPT-4o with Gemini Advanced and GPT-4, GPT-4o performed better in total ECG questions (p = 0.027 and p < 0.001, respectively). In routine daily ECG questions, GPT-4o also outperformed Gemini Advanced (p = 0.004). Weak agreement was observed in the responses given by GPT-4 (p < 0.001, Fleiss Kappa = 0.265) and Gemini Advanced (p < 0.001, Fleiss Kappa = 0.347), while moderate agreement was observed in the responses given by GPT-4o (p < 0.001, Fleiss Kappa = 0.514).

CONCLUSION

While GPT-4o shows promise, especially in more challenging ECG questions, and may have potential as an assistant for ECG evaluation, its performance in routine and overall assessments still lags behind human specialists. The limited accuracy and consistency of GPT-4 and Gemini suggest that their current use in clinical ECG interpretation is risky.

摘要

简介

GPT-4、GPT-4o 和 Gemini Advanced 是知名的大型语言模型（LLM）之一，具有识别和解释视觉数据的能力。在文献中检索时，只有少数研究评估了 GPT-4 的心电图表现。然而，目前尚无研究评估 Gemini 和 GPT-4o 在心电图评估中的成功。我们的研究旨在评估 GPT-4、GPT-4o 和 Gemini 在心电图评估中的性能，评估它们在医学领域的可用性，并比较它们在心电图解释方面的准确率与心脏病专家和急诊医学专家的准确率。

方法

本研究于 2024 年 5 月 14 日至 6 月 3 日进行。《150 例心电图病例》一书作为参考，包含两个部分：日常心电图和更具挑战性的心电图。在这项研究中，两名急诊医学专家从每个部分中选择了 20 例心电图病例，共 40 例。在接下来的阶段，由急诊医学专家和心脏病专家评估问题。在接下来的阶段，在单独的聊天界面上，每天向 GPT-4、GPT-4o 和 Gemini Advanced 输入一个诊断问题。在最后阶段，对心脏病专家、急诊医学专家、GPT-4、GPT-4o 和 Gemini Advanced 提供的回复进行了三个类别的统计评估：日常心电图、更具挑战性的心电图和总心电图。

结果

在所有三个组中，心脏病专家的表现均优于 GPT-4、GPT-4o 和 Gemini Advanced。在日常心电图问题和总心电图问题上，急诊医学专家的表现优于 GPT-4o（p=0.003 和 p=0.042）。在总心电图问题上，与 Gemini Advanced 和 GPT-4 相比，GPT-4o 的表现更好（p=0.027 和 p<0.001）。在日常心电图问题上，GPT-4o 也优于 Gemini Advanced（p=0.004）。在 GPT-4 给出的回复中观察到弱一致性（p<0.001，Fleiss Kappa=0.265）和 Gemini Advanced（p<0.001，Fleiss Kappa=0.347），而在 GPT-4o 给出的回复中观察到中等一致性（p<0.001，Fleiss Kappa=0.514）。

结论

虽然 GPT-4o 显示出潜力，尤其是在更具挑战性的心电图问题方面，并且可能作为心电图评估的辅助工具具有潜力，但它在常规和整体评估中的表现仍落后于人类专家。GPT-4 和 Gemini 的准确性和一致性有限，表明它们目前在临床心电图解释中的使用存在风险。

相似文献

The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists.Gemini、GPT-4 和 GPT-4o 在心电图分析中的准确性：与心脏病专家和急诊医学专家的比较。

Am J Emerg Med. 2024 Oct;84:68-73. doi: 10.1016/j.ajem.2024.07.043. Epub 2024 Jul 30.

Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment.比较急诊医学专家、心脏病专家和 Chat-GPT 在心电图评估中的表现。

Am J Emerg Med. 2024 Jun;80:51-60. doi: 10.1016/j.ajem.2024.03.017. Epub 2024 Mar 15.

Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.ChatGPT-4o和谷歌Gemini在基于图像的神经外科委员会问题上的表现准确性和质量。

Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.

In the face of confounders: Atrial fibrillation detection - Practitioners vs. ChatGPT.面对混杂因素：心房颤动检测——从业者与ChatGPT对比

J Electrocardiol. 2025 Jan-Feb;88:153851. doi: 10.1016/j.jelectrocard.2024.153851. Epub 2024 Dec 7.

Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.ChatGPT-4o与Gemini在放射诊断学培训考试中的性能对比分析

Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar.

Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.ChatGPT、Gemini 与急诊专科医生在急诊病情严重程度分级评估中的比较分析。

Am J Emerg Med. 2024 Jul;81:146-150. doi: 10.1016/j.ajem.2024.05.001. Epub 2024 May 3.

Assessing the performance of Microsoft Copilot, GPT-4 and Google Gemini in ophthalmology.评估Microsoft Copilot、GPT-4和Google Gemini在眼科领域的性能。

Can J Ophthalmol. 2025 Feb 4. doi: 10.1016/j.jcjo.2025.01.001.

Capabilities of GPT-4o and Gemini 1.5 Pro in Gram stain and bacterial shape identification.GPT-4o 和 Gemini 1.5 Pro 在革兰氏染色和细菌形态识别方面的能力。

Future Microbiol. 2024;19(15):1283-1292. doi: 10.1080/17460913.2024.2381967. Epub 2024 Jul 29.

Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.GPT-4o、Claude 3 Opus 和 Gemini 1.5 Pro 在“诊断请”案例中的诊断性能。

Jpn J Radiol. 2024 Nov;42(11):1231-1235. doi: 10.1007/s11604-024-01619-y. Epub 2024 Jul 1.

Evaluating AI proficiency in nuclear cardiology: Large language models take on the board preparation exam.评估人工智能在核心脏病学方面的熟练程度：大型语言模型参加资格考试。

J Nucl Cardiol. 2025 Mar;45:102089. doi: 10.1016/j.nuclcard.2024.102089. Epub 2024 Nov 29.

引用本文的文献

Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation.用于增强眼科决策的多模态推理智能体：一项初步的真实世界临床验证

Front Cell Dev Biol. 2025 Jul 23;13:1642539. doi: 10.3389/fcell.2025.1642539. eCollection 2025.

Using large language models to generate child-friendly education materials on myopia.使用大语言模型生成适合儿童的近视教育材料。

Digit Health. 2025 Jul 30;11:20552076251362338. doi: 10.1177/20552076251362338. eCollection 2025 Jan-Dec.

Evaluating GPT-4's role in critical patient management in emergency departments.评估GPT-4在急诊科危急患者管理中的作用。

PLoS One. 2025 Jul 24;20(7):e0327584. doi: 10.1371/journal.pone.0327584. eCollection 2025.

Prognostic significance of the Tpe/Qtc ratio in predicting major adverse cardiac events in acute STEMI patients.Tpe/Qtc比值对急性ST段抬高型心肌梗死患者主要不良心脏事件预测的预后意义

Sci Rep. 2025 Jul 12;15(1):25259. doi: 10.1038/s41598-025-11353-8.

Assessment of Recommendations Provided to Athletes Regarding Sleep Education by GPT-4o and Google Gemini: Comparative Evaluation Study.GPT-4o和谷歌Gemini向运动员提供的关于睡眠教育的建议评估：比较评估研究

JMIR Form Res. 2025 Jul 8;9:e71358. doi: 10.2196/71358.

ChatGPT performance in answering medical residency questions in nephrology: a pilot study in Brazil.ChatGPT在回答巴西肾脏科住院医师问题方面的表现：一项试点研究

J Bras Nefrol. 2025 Oct-Dec;47(4):e20240254. doi: 10.1590/2175-8239-JBN-2024-0254en.

Do LLMs Have 'the Eye' for MRI? Evaluating GPT-4o, Grok, and Gemini on Brain MRI Performance: First Evaluation of Grok in Medical Imaging and a Comparative Analysis.大型语言模型对磁共振成像有“洞察力”吗？评估GPT-4o、Grok和Gemini在脑部磁共振成像性能方面的表现：Grok在医学成像中的首次评估及比较分析

Diagnostics (Basel). 2025 May 24;15(11):1320. doi: 10.3390/diagnostics15111320.

Predicting Immunotherapy Response in Unresectable Hepatocellular Carcinoma: A Comparative Study of Large Language Models and Human Experts.预测不可切除肝细胞癌的免疫治疗反应：大语言模型与人类专家的比较研究

J Med Syst. 2025 May 15;49(1):64. doi: 10.1007/s10916-025-02192-1.

AI-driven simplification of surgical reports in gynecologic oncology: A potential tool for patient education.人工智能驱动的妇科肿瘤手术报告简化：一种潜在的患者教育工具。

Acta Obstet Gynecol Scand. 2025 Jul;104(7):1373-1381. doi: 10.1111/aogs.15123. Epub 2025 May 14.

A Practical Guide to the Utilization of ChatGPT in the Emergency Department: A Systematic Review of Current Applications, Future Directions, and Limitations.急诊科使用ChatGPT实用指南：当前应用、未来方向及局限性的系统评价

Cureus. 2025 Apr 6;17(4):e81802. doi: 10.7759/cureus.81802. eCollection 2025 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Gemini、GPT-4 和 GPT-4o 在心电图分析中的准确性：与心脏病专家和急诊医学专家的比较。

The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists.

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

简介

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献