ChatGPT 联合 GPT-4 在诊断准确率上优于急诊科医生：回顾性分析。

ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.

机构信息

Department of Medicine IV, LMU University Hospital, Munich, Germany.

Department of Medicine I, LMU University Hospital, Munich, Germany.

出版信息

J Med Internet Res. 2024 Jul 8;26:e56110. doi: 10.2196/56110.

DOI:10.2196/56110

PMID:38976865

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11263899/

Abstract

BACKGROUND

OpenAI's ChatGPT is a pioneering artificial intelligence (AI) in the field of natural language processing, and it holds significant potential in medicine for providing treatment advice. Additionally, recent studies have demonstrated promising results using ChatGPT for emergency medicine triage. However, its diagnostic accuracy in the emergency department (ED) has not yet been evaluated.

OBJECTIVE

This study compares the diagnostic accuracy of ChatGPT with GPT-3.5 and GPT-4 and primary treating resident physicians in an ED setting.

METHODS

Among 100 adults admitted to our ED in January 2023 with internal medicine issues, the diagnostic accuracy was assessed by comparing the diagnoses made by ED resident physicians and those made by ChatGPT with GPT-3.5 or GPT-4 against the final hospital discharge diagnosis, using a point system for grading accuracy.

RESULTS

The study enrolled 100 patients with a median age of 72 (IQR 58.5-82.0) years who were admitted to our internal medicine ED primarily for cardiovascular, endocrine, gastrointestinal, or infectious diseases. GPT-4 outperformed both GPT-3.5 (P<.001) and ED resident physicians (P=.01) in diagnostic accuracy for internal medicine emergencies. Furthermore, across various disease subgroups, GPT-4 consistently outperformed GPT-3.5 and resident physicians. It demonstrated significant superiority in cardiovascular (GPT-4 vs ED physicians: P=.03) and endocrine or gastrointestinal diseases (GPT-4 vs GPT-3.5: P=.01). However, in other categories, the differences were not statistically significant.

CONCLUSIONS

In this study, which compared the diagnostic accuracy of GPT-3.5, GPT-4, and ED resident physicians against a discharge diagnosis gold standard, GPT-4 outperformed both the resident physicians and its predecessor, GPT-3.5. Despite the retrospective design of the study and its limited sample size, the results underscore the potential of AI as a supportive diagnostic tool in ED settings.

摘要

背景

OpenAI 的 ChatGPT 是自然语言处理领域的开创性人工智能（AI），它在提供治疗建议方面具有重要的医学应用潜力。此外，最近的研究表明，ChatGPT 在急诊分诊中具有很有前景的结果。然而，它在急诊室（ED）的诊断准确性尚未得到评估。

目的

本研究比较了 ChatGPT 与 GPT-3.5 和 GPT-4 以及 ED 主治住院医师在 ED 环境中的诊断准确性。

方法

在 2023 年 1 月入住我们 ED 的 100 名患有内科问题的成年人中，通过比较 ED 主治住院医师与 ChatGPT 与 GPT-3.5 或 GPT-4 做出的诊断与最终出院诊断，使用分级准确性的评分系统来评估诊断准确性。

结果

这项研究共纳入了 100 名中位年龄为 72（IQR 58.5-82.0）岁的患者，他们主要因心血管、内分泌、胃肠道或传染病而入住我们的内科 ED。GPT-4 在诊断内科急症方面的准确性优于 GPT-3.5（P<.001）和 ED 主治住院医师（P=.01）。此外，在各种疾病亚组中，GPT-4 始终优于 GPT-3.5 和主治住院医师。它在心血管疾病（GPT-4 与 ED 医师：P=.03）和内分泌或胃肠道疾病（GPT-4 与 GPT-3.5：P=.01）方面表现出显著优势。然而，在其他类别中，差异没有统计学意义。

结论

在这项研究中，我们将 GPT-3.5、GPT-4 和 ED 主治住院医师的诊断准确性与出院诊断金标准进行了比较，GPT-4 的表现优于主治住院医师和其前身 GPT-3.5。尽管研究采用了回顾性设计且样本量有限，但研究结果强调了 AI 作为 ED 环境中辅助诊断工具的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a2/11263899/ce81ad18b74e/jmir_v26i1e56110_fig1.jpg

相似文献

ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.

J Med Internet Res. 2024 Jul 8;26:e56110. doi: 10.2196/56110.

Assessing the precision of artificial intelligence in ED triage decisions: Insights from a study with ChatGPT.

Am J Emerg Med. 2024 Apr;78:170-175. doi: 10.1016/j.ajem.2024.01.037. Epub 2024 Jan 24.

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

Emergency department triaging using ChatGPT based on emergency severity index principles: a cross-sectional study.

Sci Rep. 2024 Sep 27;14(1):22106. doi: 10.1038/s41598-024-73229-7.

Patient-Representing Population's Perceptions of GPT-Generated Versus Standard Emergency Department Discharge Instructions: Randomized Blind Survey Assessment.

J Med Internet Res. 2024 Aug 2;26:e60336. doi: 10.2196/60336.

Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study.

JMIR Mhealth Uhealth. 2023 Oct 3;11:e49995. doi: 10.2196/49995.

Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4.

J Med Internet Res. 2024 Jun 27;26:e54571. doi: 10.2196/54571.

The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.

Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.

Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.

Am J Emerg Med. 2024 Jul;81:146-150. doi: 10.1016/j.ajem.2024.05.001. Epub 2024 May 3.

Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage of Simulated Patients Using the Simple Triage and Rapid Treatment (START) Protocol: Gage Repeatability and Reproducibility Study.

J Med Internet Res. 2024 Sep 30;26:e55648. doi: 10.2196/55648.

引用本文的文献

Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?

Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.

A bibliometric analysis of large language model-based AI chatbots in surgery.

Ann Med Surg (Lond). 2025 May 12;87(7):4127-4138. doi: 10.1097/MS9.0000000000003234. eCollection 2025 Jul.

The performance of ChatGPT on medical image-based assessments and implications for medical education.

BMC Med Educ. 2025 Aug 23;25(1):1192. doi: 10.1186/s12909-025-07752-0.

Co-production of Diagnostic Excellence - Patients, Clinicians, and Artificial Intelligence Comment on "Achieving Diagnostic Excellence: Roadmaps to Develop and Use Patient-Reported Measures With an Equity Lens".

Int J Health Policy Manag. 2025;14:8973. doi: 10.34172/ijhpm.8973. Epub 2025 Jun 17.

Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation.

BMC Emerg Med. 2025 Jul 31;25(1):142. doi: 10.1186/s12873-025-01303-y.

Use of a Medical Communication Framework to Assess the Quality of Generative Artificial Intelligence Replies to Primary Care Patient Portal Messages: Content Analysis.

JMIR Form Res. 2025 Jul 31;9:e71966. doi: 10.2196/71966.

Artificial intelligence in coronary angiography: benchmarking the diagnostic accuracy of ChatGPT-4o against interventional cardiologists.

Open Heart. 2025 Jul 20;12(2):e003316. doi: 10.1136/openhrt-2025-003316.

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.

J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.

ChatGPT-o1 Preview Outperforms ChatGPT-4 as a Diagnostic Support Tool for Ankle Pain Triage in Emergency Settings.

Arch Acad Emerg Med. 2025 Apr 5;13(1):e42. doi: 10.22037/aaemj.v13i1.2580. eCollection 2025.

A Practical Guide to the Utilization of ChatGPT in the Emergency Department: A Systematic Review of Current Applications, Future Directions, and Limitations.

Cureus. 2025 Apr 6;17(4):e81802. doi: 10.7759/cureus.81802. eCollection 2025 Apr.

本文引用的文献

ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.

JMIR Med Inform. 2023 Oct 9;11:e48808. doi: 10.2196/48808.

Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study.

JMIR Mhealth Uhealth. 2023 Oct 3;11:e49995. doi: 10.2196/49995.

Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.

J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.

Machine learning for ECG diagnosis and risk stratification of occlusion myocardial infarction.

Nat Med. 2023 Jul;29(7):1804-1813. doi: 10.1038/s41591-023-02396-3. Epub 2023 Jun 29.

ChatGPT: A Valuable Tool for Emergency Medical Assistance.

Ann Emerg Med. 2023 Sep;82(3):411-413. doi: 10.1016/j.annemergmed.2023.04.027. Epub 2023 Jun 17.

The ChatGPT Era: Artificial Intelligence in Emergency Medicine.

Ann Emerg Med. 2023 Jun;81(6):764-765. doi: 10.1016/j.annemergmed.2023.01.022.

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.

Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.

Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model.

JAMA. 2023 Mar 14;329(10):842-844. doi: 10.1001/jama.2023.1044.

AI bot ChatGPT writes smart essays - should professors worry?

Nature. 2022 Dec 9. doi: 10.1038/d41586-022-04397-7.

Review of the Basics of Cognitive Error in Emergency Medicine: Still No Easy Answers.

West J Emerg Med. 2020 Nov 2;21(6):125-131. doi: 10.5811/westjem.2020.7.47832.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT 联合 GPT-4 在诊断准确率上优于急诊科医生：回顾性分析。

ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献