文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

在紧急情况下,ChatGPT-01预览版作为踝关节疼痛分诊的诊断支持工具,其表现优于ChatGPT-4。

ChatGPT-o1 Preview Outperforms ChatGPT-4 as a Diagnostic Support Tool for Ankle Pain Triage in Emergency Settings.

作者信息

Hosseini-Monfared Pooya, Amiri Shayan, Mirahmadi Alireza, Shahbazi Amirhossein, Alamian Aliasghar, Azizi Mohammad, Kazemi Seyed Morteza

机构信息

Bone Joint and Related Tissues Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Bone and Joint Reconstruction Research Center, Department of Orthopedics, School of Medicine, Iran University of Medical Sciences, Tehran, Iran.

出版信息

Arch Acad Emerg Med. 2025 Apr 5;13(1):e42. doi: 10.22037/aaemj.v13i1.2580. eCollection 2025.


DOI:10.22037/aaemj.v13i1.2580
PMID:40487902
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12145124/
Abstract

INTRODUCTION: ChatGPT, a general-purpose language model, is not specifically optimized for medical applications. This study aimed to assess the performance of ChatGPT-4 and o1-preview in generating differential diagnoses for common cases of ankle pain in emergency settings. METHODS: Common presentations of ankle pain were identified through consultations with an experienced orthopedic surgeon and a review of relevant hospital and social media sources. To replicate typical patient inquiries, questions were crafted in simple, non-technical language, requesting three possible differential diagnoses for each scenario. The second phase involved designing case vignettes reflecting scenarios typical for triage nurses or physicians. Responses from ChatGPT were evaluated against a benchmark established by two experienced orthopedic surgeons, with a scoring system assessing the accuracy, clarity, and relevance of the differential diagnoses based on their order. RESULTS: In 21 ankle pain presentations, ChatGPT-o1 preview outperformed ChatGPT-4 in both accuracy and clarity, with only the clarity score reaching statistical significance (p < 0.001). ChatGPT-o1 preview also had a significantly higher total score (p = 0.004). In 15 case vignettes, ChatGPT-o1 preview scored better on diagnostic and management clarity, though differences in diagnostic accuracy were not statistically significant. Among 51 questions, ChatGPT-4 and ChatGPT-o1 preview produced incorrect responses for 5 (9.8%) and 4 (7.8%) questions, respectively. Inter-rater reliability analysis demonstrated excellent reliability of the scoring system with interclass coefficients of 0.99 (95% CI, 0.998-0.999) for accuracy scores and 0.99 (95% CI, 0.990-0.995) for clarity scores. CONCLUSION: Our findings demonstrated that both ChatGPT-4 and ChatGPT-o1 preview provide acceptable performance in the triage of ankle pain cases in emergency settings. ChatGPT-o1 preview outperformed ChatGPT-4, offering clearer and more precise responses. While both models show potential as supportive tools, their role should remain supervised and strictly supplementary to clinical expertise.

摘要

引言:ChatGPT是一种通用语言模型,并非专门针对医学应用进行优化。本研究旨在评估ChatGPT-4和o1-preview在为急诊环境中常见的踝关节疼痛病例生成鉴别诊断方面的表现。 方法:通过与一位经验丰富的骨科医生进行会诊,并查阅相关医院和社交媒体资料,确定踝关节疼痛的常见表现。为了模拟典型的患者咨询,问题采用简单、非专业的语言编写,要求针对每种情况给出三种可能的鉴别诊断。第二阶段涉及设计反映分诊护士或医生典型场景的病例 vignettes。根据两位经验丰富的骨科医生建立的基准对ChatGPT的回答进行评估,采用评分系统根据鉴别诊断的顺序评估其准确性、清晰度和相关性。 结果:在21例踝关节疼痛表现中,ChatGPT-o1 preview在准确性和清晰度方面均优于ChatGPT-4,仅清晰度得分达到统计学意义(p < 0.001)。ChatGPT-o1 preview的总分也显著更高(p = 0.004)。在15个病例 vignettes中,ChatGPT-o1 preview在诊断和管理清晰度方面得分更高,尽管诊断准确性的差异无统计学意义。在51个问题中,ChatGPT-4和ChatGPT-o1 preview分别对5个(9.8%)和4个(7.8%)问题给出了错误回答。评分者间信度分析表明评分系统具有出色的信度,准确性得分的组内相关系数为0.99(95% CI,0.998 - 0.999),清晰度得分的组内相关系数为0.99(95% CI,0.990 - 0.995)。 结论:我们的研究结果表明,ChatGPT-4和ChatGPT-o1 preview在急诊环境中踝关节疼痛病例的分诊中均表现出可接受的性能。ChatGPT-o1 preview优于ChatGPT-4,提供了更清晰、更精确的回答。虽然这两种模型都显示出作为辅助工具的潜力,但其作用应保持受监督状态,并且严格作为临床专业知识的补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf41/12145124/8b9f4429eceb/aaem-13-e42-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf41/12145124/8b9f4429eceb/aaem-13-e42-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf41/12145124/8b9f4429eceb/aaem-13-e42-g001.jpg

相似文献

[1]
ChatGPT-o1 Preview Outperforms ChatGPT-4 as a Diagnostic Support Tool for Ankle Pain Triage in Emergency Settings.

Arch Acad Emerg Med. 2025-4-5

[2]
The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain.

Arthroscopy. 2025-5

[3]
Artificial intelligence versus orthopedic surgeons as an orthopedic consultant in the emergency department.

Injury. 2025-4

[4]
Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma.

Discov Oncol. 2025-5-23

[5]
Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.

Diagn Interv Radiol. 2025-5-12

[6]
Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics.

Sci Rep. 2025-3-26

[7]
Performance of Artificial Intelligence in Addressing Questions Regarding the Management of Pediatric Supracondylar Humerus Fractures.

J Pediatr Soc North Am. 2025-3-9

[8]
ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.

J Med Internet Res. 2024-7-8

[9]
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.

J Med Internet Res. 2025-4-10

[10]
Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.

Am J Emerg Med. 2025-3

本文引用的文献

[1]
A Holistic Approach to Implementing Artificial Intelligence in Lung Cancer.

Indian J Surg Oncol. 2025-2

[2]
Diagnostic Accuracy of ChatGPT for Patients' Triage; a Systematic Review and Meta-Analysis.

Arch Acad Emerg Med. 2024-7-30

[3]
ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.

J Med Internet Res. 2024-7-8

[4]
The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain.

Arthroscopy. 2025-5

[5]
Large Language Models in Orthopaedics: Definitions, Uses, and Limitations.

J Bone Joint Surg Am. 2024-8-7

[6]
Enhancing the Diagnosis of Lateral Ankle Sprains: The Role of MSK Diagnostic Ultrasound in Evaluating ATFL and CFL.

Int J Sports Phys Ther. 2024-2-1

[7]
The Role of ChatGPT in the Advancement of Diagnosis, Management, and Prognosis of Cardiovascular and Cerebrovascular Disease.

Healthcare (Basel). 2023-11-6

[8]
Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial.

J Med Internet Res. 2023-10-4

[9]
Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study.

JMIR Mhealth Uhealth. 2023-10-3

[10]
Revolutionizing healthcare: the role of artificial intelligence in clinical practice.

BMC Med Educ. 2023-9-22

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索