• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT 联合 GPT-4 在诊断准确率上优于急诊科医生:回顾性分析。

ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.

机构信息

Department of Medicine IV, LMU University Hospital, Munich, Germany.

Department of Medicine I, LMU University Hospital, Munich, Germany.

出版信息

J Med Internet Res. 2024 Jul 8;26:e56110. doi: 10.2196/56110.

DOI:10.2196/56110
PMID:38976865
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11263899/
Abstract

BACKGROUND

OpenAI's ChatGPT is a pioneering artificial intelligence (AI) in the field of natural language processing, and it holds significant potential in medicine for providing treatment advice. Additionally, recent studies have demonstrated promising results using ChatGPT for emergency medicine triage. However, its diagnostic accuracy in the emergency department (ED) has not yet been evaluated.

OBJECTIVE

This study compares the diagnostic accuracy of ChatGPT with GPT-3.5 and GPT-4 and primary treating resident physicians in an ED setting.

METHODS

Among 100 adults admitted to our ED in January 2023 with internal medicine issues, the diagnostic accuracy was assessed by comparing the diagnoses made by ED resident physicians and those made by ChatGPT with GPT-3.5 or GPT-4 against the final hospital discharge diagnosis, using a point system for grading accuracy.

RESULTS

The study enrolled 100 patients with a median age of 72 (IQR 58.5-82.0) years who were admitted to our internal medicine ED primarily for cardiovascular, endocrine, gastrointestinal, or infectious diseases. GPT-4 outperformed both GPT-3.5 (P<.001) and ED resident physicians (P=.01) in diagnostic accuracy for internal medicine emergencies. Furthermore, across various disease subgroups, GPT-4 consistently outperformed GPT-3.5 and resident physicians. It demonstrated significant superiority in cardiovascular (GPT-4 vs ED physicians: P=.03) and endocrine or gastrointestinal diseases (GPT-4 vs GPT-3.5: P=.01). However, in other categories, the differences were not statistically significant.

CONCLUSIONS

In this study, which compared the diagnostic accuracy of GPT-3.5, GPT-4, and ED resident physicians against a discharge diagnosis gold standard, GPT-4 outperformed both the resident physicians and its predecessor, GPT-3.5. Despite the retrospective design of the study and its limited sample size, the results underscore the potential of AI as a supportive diagnostic tool in ED settings.

摘要

背景

OpenAI 的 ChatGPT 是自然语言处理领域的开创性人工智能(AI),它在提供治疗建议方面具有重要的医学应用潜力。此外,最近的研究表明,ChatGPT 在急诊分诊中具有很有前景的结果。然而,它在急诊室(ED)的诊断准确性尚未得到评估。

目的

本研究比较了 ChatGPT 与 GPT-3.5 和 GPT-4 以及 ED 主治住院医师在 ED 环境中的诊断准确性。

方法

在 2023 年 1 月入住我们 ED 的 100 名患有内科问题的成年人中,通过比较 ED 主治住院医师与 ChatGPT 与 GPT-3.5 或 GPT-4 做出的诊断与最终出院诊断,使用分级准确性的评分系统来评估诊断准确性。

结果

这项研究共纳入了 100 名中位年龄为 72(IQR 58.5-82.0)岁的患者,他们主要因心血管、内分泌、胃肠道或传染病而入住我们的内科 ED。GPT-4 在诊断内科急症方面的准确性优于 GPT-3.5(P<.001)和 ED 主治住院医师(P=.01)。此外,在各种疾病亚组中,GPT-4 始终优于 GPT-3.5 和主治住院医师。它在心血管疾病(GPT-4 与 ED 医师:P=.03)和内分泌或胃肠道疾病(GPT-4 与 GPT-3.5:P=.01)方面表现出显著优势。然而,在其他类别中,差异没有统计学意义。

结论

在这项研究中,我们将 GPT-3.5、GPT-4 和 ED 主治住院医师的诊断准确性与出院诊断金标准进行了比较,GPT-4 的表现优于主治住院医师和其前身 GPT-3.5。尽管研究采用了回顾性设计且样本量有限,但研究结果强调了 AI 作为 ED 环境中辅助诊断工具的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a2/11263899/ce81ad18b74e/jmir_v26i1e56110_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a2/11263899/ce81ad18b74e/jmir_v26i1e56110_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a2/11263899/ce81ad18b74e/jmir_v26i1e56110_fig1.jpg

相似文献

1
ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.ChatGPT 联合 GPT-4 在诊断准确率上优于急诊科医生:回顾性分析。
J Med Internet Res. 2024 Jul 8;26:e56110. doi: 10.2196/56110.
2
Assessing the precision of artificial intelligence in ED triage decisions: Insights from a study with ChatGPT.评估人工智能在急诊分诊决策中的精准度:来自一项与 ChatGPT 合作研究的洞察。
Am J Emerg Med. 2024 Apr;78:170-175. doi: 10.1016/j.ajem.2024.01.037. Epub 2024 Jan 24.
3
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较:大型语言模型、ChatGPT 和未经训练的急诊医生:一项对比研究。
J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.
4
Emergency department triaging using ChatGPT based on emergency severity index principles: a cross-sectional study.基于急诊严重指数原则的使用 ChatGPT 进行急诊科分诊:一项横断面研究。
Sci Rep. 2024 Sep 27;14(1):22106. doi: 10.1038/s41598-024-73229-7.
5
Patient-Representing Population's Perceptions of GPT-Generated Versus Standard Emergency Department Discharge Instructions: Randomized Blind Survey Assessment.患者群体对 GPT 生成的与标准急诊部门出院医嘱的看法:随机盲法调查评估。
J Med Internet Res. 2024 Aug 2;26:e60336. doi: 10.2196/60336.
6
Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study.Ada 健康和 WebMD 症状检查器、ChatGPT 和医生对急诊科患者的诊断和分诊准确性比较:临床数据分析研究。
JMIR Mhealth Uhealth. 2023 Oct 3;11:e49995. doi: 10.2196/49995.
7
Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4.评估生成式预训练转换器(GPT)在临床决策中的应用:GPT-3.5 和 GPT-4 的对比分析。
J Med Internet Res. 2024 Jun 27;26:e54571. doi: 10.2196/54571.
8
The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.GPT-3 人工智能模型的诊断和分诊准确性:一项观察性研究。
Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.
9
Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.ChatGPT、Gemini 与急诊专科医生在急诊病情严重程度分级评估中的比较分析。
Am J Emerg Med. 2024 Jul;81:146-150. doi: 10.1016/j.ajem.2024.05.001. Epub 2024 May 3.
10
Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage of Simulated Patients Using the Simple Triage and Rapid Treatment (START) Protocol: Gage Repeatability and Reproducibility Study.商用大型语言模型(ChatGPT)运用简单分诊与快速治疗(START)协议对模拟患者进行灾难分诊的准确性:再现性和可重复性研究。
J Med Internet Res. 2024 Sep 30;26:e55648. doi: 10.2196/55648.

引用本文的文献

1
Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人:可靠的生命线还是风险?
Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.
2
A bibliometric analysis of large language model-based AI chatbots in surgery.基于大语言模型的人工智能聊天机器人在外科手术中的文献计量分析
Ann Med Surg (Lond). 2025 May 12;87(7):4127-4138. doi: 10.1097/MS9.0000000000003234. eCollection 2025 Jul.
3
The performance of ChatGPT on medical image-based assessments and implications for medical education.

本文引用的文献

1
ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.基于复杂病例临床案例生成的ChatGPT鉴别诊断列表:诊断准确性评估。
JMIR Med Inform. 2023 Oct 9;11:e48808. doi: 10.2196/48808.
2
Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study.Ada 健康和 WebMD 症状检查器、ChatGPT 和医生对急诊科患者的诊断和分诊准确性比较:临床数据分析研究。
JMIR Mhealth Uhealth. 2023 Oct 3;11:e49995. doi: 10.2196/49995.
3
Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study.
ChatGPT在基于医学图像的评估中的表现及其对医学教育的影响。
BMC Med Educ. 2025 Aug 23;25(1):1192. doi: 10.1186/s12909-025-07752-0.
4
Co-production of Diagnostic Excellence - Patients, Clinicians, and Artificial Intelligence Comment on "Achieving Diagnostic Excellence: Roadmaps to Develop and Use Patient-Reported Measures With an Equity Lens".卓越诊断的共同生产——患者、临床医生和人工智能对《实现卓越诊断:以公平视角制定和使用患者报告指标的路线图》的评论
Int J Health Policy Manag. 2025;14:8973. doi: 10.34172/ijhpm.8973. Epub 2025 Jun 17.
5
Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation.在处理常见急诊病例方面,人工智能能否与急诊医生相媲美?一项比较性能评估。
BMC Emerg Med. 2025 Jul 31;25(1):142. doi: 10.1186/s12873-025-01303-y.
6
Use of a Medical Communication Framework to Assess the Quality of Generative Artificial Intelligence Replies to Primary Care Patient Portal Messages: Content Analysis.使用医学交流框架评估生成式人工智能对基层医疗患者门户消息的回复质量:内容分析
JMIR Form Res. 2025 Jul 31;9:e71966. doi: 10.2196/71966.
7
Artificial intelligence in coronary angiography: benchmarking the diagnostic accuracy of ChatGPT-4o against interventional cardiologists.冠状动脉造影中的人工智能:将ChatGPT-4o的诊断准确性与介入心脏病专家进行对比。
Open Heart. 2025 Jul 20;12(2):e003316. doi: 10.1136/openhrt-2025-003316.
8
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型:基于文献计量分析的综述
J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.
9
ChatGPT-o1 Preview Outperforms ChatGPT-4 as a Diagnostic Support Tool for Ankle Pain Triage in Emergency Settings.在紧急情况下,ChatGPT-01预览版作为踝关节疼痛分诊的诊断支持工具,其表现优于ChatGPT-4。
Arch Acad Emerg Med. 2025 Apr 5;13(1):e42. doi: 10.22037/aaemj.v13i1.2580. eCollection 2025.
10
A Practical Guide to the Utilization of ChatGPT in the Emergency Department: A Systematic Review of Current Applications, Future Directions, and Limitations.急诊科使用ChatGPT实用指南:当前应用、未来方向及局限性的系统评价
Cureus. 2025 Apr 6;17(4):e81802. doi: 10.7759/cureus.81802. eCollection 2025 Apr.
评估 ChatGPT 在整个临床工作流程中的效用:开发和可用性研究。
J Med Internet Res. 2023 Aug 22;25:e48659. doi: 10.2196/48659.
4
Machine learning for ECG diagnosis and risk stratification of occlusion myocardial infarction.机器学习在心电图诊断和闭塞性心肌梗死危险分层中的应用。
Nat Med. 2023 Jul;29(7):1804-1813. doi: 10.1038/s41591-023-02396-3. Epub 2023 Jun 29.
5
ChatGPT: A Valuable Tool for Emergency Medical Assistance.ChatGPT:紧急医疗援助的宝贵工具。
Ann Emerg Med. 2023 Sep;82(3):411-413. doi: 10.1016/j.annemergmed.2023.04.027. Epub 2023 Jun 17.
6
The ChatGPT Era: Artificial Intelligence in Emergency Medicine.ChatGPT时代:急诊医学中的人工智能
Ann Emerg Med. 2023 Jun;81(6):764-765. doi: 10.1016/j.annemergmed.2023.01.022.
7
Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.基于生成式预训练 Transformer 3 聊天机器人为常见主诉临床病例生成鉴别诊断列表的诊断准确性:一项初步研究。
Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378.
8
Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat-Based Artificial Intelligence Model.从一个基于在线聊天的流行人工智能模型获取的心血管疾病预防建议的适宜性。
JAMA. 2023 Mar 14;329(10):842-844. doi: 10.1001/jama.2023.1044.
9
AI bot ChatGPT writes smart essays - should professors worry?人工智能聊天机器人ChatGPT能写出很巧妙的文章——教授们应该担心吗?
Nature. 2022 Dec 9. doi: 10.1038/d41586-022-04397-7.
10
Review of the Basics of Cognitive Error in Emergency Medicine: Still No Easy Answers.急诊医学中认知错误基础的回顾:仍无简单答案。
West J Emerg Med. 2020 Nov 2;21(6):125-131. doi: 10.5811/westjem.2020.7.47832.