• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在紧急情况下,ChatGPT-01预览版作为踝关节疼痛分诊的诊断支持工具,其表现优于ChatGPT-4。

ChatGPT-o1 Preview Outperforms ChatGPT-4 as a Diagnostic Support Tool for Ankle Pain Triage in Emergency Settings.

作者信息

Hosseini-Monfared Pooya, Amiri Shayan, Mirahmadi Alireza, Shahbazi Amirhossein, Alamian Aliasghar, Azizi Mohammad, Kazemi Seyed Morteza

机构信息

Bone Joint and Related Tissues Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Bone and Joint Reconstruction Research Center, Department of Orthopedics, School of Medicine, Iran University of Medical Sciences, Tehran, Iran.

出版信息

Arch Acad Emerg Med. 2025 Apr 5;13(1):e42. doi: 10.22037/aaemj.v13i1.2580. eCollection 2025.

DOI:10.22037/aaemj.v13i1.2580
PMID:40487902
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12145124/
Abstract

INTRODUCTION

ChatGPT, a general-purpose language model, is not specifically optimized for medical applications. This study aimed to assess the performance of ChatGPT-4 and o1-preview in generating differential diagnoses for common cases of ankle pain in emergency settings.

METHODS

Common presentations of ankle pain were identified through consultations with an experienced orthopedic surgeon and a review of relevant hospital and social media sources. To replicate typical patient inquiries, questions were crafted in simple, non-technical language, requesting three possible differential diagnoses for each scenario. The second phase involved designing case vignettes reflecting scenarios typical for triage nurses or physicians. Responses from ChatGPT were evaluated against a benchmark established by two experienced orthopedic surgeons, with a scoring system assessing the accuracy, clarity, and relevance of the differential diagnoses based on their order.

RESULTS

In 21 ankle pain presentations, ChatGPT-o1 preview outperformed ChatGPT-4 in both accuracy and clarity, with only the clarity score reaching statistical significance (p < 0.001). ChatGPT-o1 preview also had a significantly higher total score (p = 0.004). In 15 case vignettes, ChatGPT-o1 preview scored better on diagnostic and management clarity, though differences in diagnostic accuracy were not statistically significant. Among 51 questions, ChatGPT-4 and ChatGPT-o1 preview produced incorrect responses for 5 (9.8%) and 4 (7.8%) questions, respectively. Inter-rater reliability analysis demonstrated excellent reliability of the scoring system with interclass coefficients of 0.99 (95% CI, 0.998-0.999) for accuracy scores and 0.99 (95% CI, 0.990-0.995) for clarity scores.

CONCLUSION

Our findings demonstrated that both ChatGPT-4 and ChatGPT-o1 preview provide acceptable performance in the triage of ankle pain cases in emergency settings. ChatGPT-o1 preview outperformed ChatGPT-4, offering clearer and more precise responses. While both models show potential as supportive tools, their role should remain supervised and strictly supplementary to clinical expertise.

摘要

引言

ChatGPT是一种通用语言模型,并非专门针对医学应用进行优化。本研究旨在评估ChatGPT-4和o1-preview在为急诊环境中常见的踝关节疼痛病例生成鉴别诊断方面的表现。

方法

通过与一位经验丰富的骨科医生进行会诊,并查阅相关医院和社交媒体资料,确定踝关节疼痛的常见表现。为了模拟典型的患者咨询,问题采用简单、非专业的语言编写,要求针对每种情况给出三种可能的鉴别诊断。第二阶段涉及设计反映分诊护士或医生典型场景的病例 vignettes。根据两位经验丰富的骨科医生建立的基准对ChatGPT的回答进行评估,采用评分系统根据鉴别诊断的顺序评估其准确性、清晰度和相关性。

结果

在21例踝关节疼痛表现中,ChatGPT-o1 preview在准确性和清晰度方面均优于ChatGPT-4,仅清晰度得分达到统计学意义(p < 0.001)。ChatGPT-o1 preview的总分也显著更高(p = 0.004)。在15个病例 vignettes中,ChatGPT-o1 preview在诊断和管理清晰度方面得分更高,尽管诊断准确性的差异无统计学意义。在51个问题中,ChatGPT-4和ChatGPT-o1 preview分别对5个(9.8%)和4个(7.8%)问题给出了错误回答。评分者间信度分析表明评分系统具有出色的信度,准确性得分的组内相关系数为0.99(95% CI,0.998 - 0.999),清晰度得分的组内相关系数为0.99(95% CI,0.990 - 0.995)。

结论

我们的研究结果表明,ChatGPT-4和ChatGPT-o1 preview在急诊环境中踝关节疼痛病例的分诊中均表现出可接受的性能。ChatGPT-o1 preview优于ChatGPT-4,提供了更清晰、更精确的回答。虽然这两种模型都显示出作为辅助工具的潜力,但其作用应保持受监督状态,并且严格作为临床专业知识的补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf41/12145124/8b9f4429eceb/aaem-13-e42-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf41/12145124/8b9f4429eceb/aaem-13-e42-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf41/12145124/8b9f4429eceb/aaem-13-e42-g001.jpg

相似文献

1
ChatGPT-o1 Preview Outperforms ChatGPT-4 as a Diagnostic Support Tool for Ankle Pain Triage in Emergency Settings.在紧急情况下,ChatGPT-01预览版作为踝关节疼痛分诊的诊断支持工具,其表现优于ChatGPT-4。
Arch Acad Emerg Med. 2025 Apr 5;13(1):e42. doi: 10.22037/aaemj.v13i1.2580. eCollection 2025.
2
The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain.大型语言模型ChatGPT-4对因各种原因导致膝关节疼痛的患者表现出出色的分诊能力和诊断性能。
Arthroscopy. 2025 May;41(5):1438-1447.e14. doi: 10.1016/j.arthro.2024.06.021. Epub 2024 Jun 24.
3
Artificial intelligence versus orthopedic surgeons as an orthopedic consultant in the emergency department.人工智能与骨科医生在急诊科作为骨科顾问的比较。
Injury. 2025 Apr;56(4):112297. doi: 10.1016/j.injury.2025.112297. Epub 2025 Mar 22.
4
Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma.大语言模型ChatGPT-3.5、ChatGPT-4和Open AI-o1在骨髓瘤细胞程序性死亡领域的性能比较分析。
Discov Oncol. 2025 May 23;16(1):870. doi: 10.1007/s12672-025-02648-3.
5
Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.评估大型语言模型在放射学中的参考准确性:一项跨亚专业的比较研究。
Diagn Interv Radiol. 2025 May 12. doi: 10.4274/dir.2025.253101.
6
Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics.ChatGPT模型迭代在急诊科诊断中的初步评估。
Sci Rep. 2025 Mar 26;15(1):10426. doi: 10.1038/s41598-025-95233-1.
7
Performance of Artificial Intelligence in Addressing Questions Regarding the Management of Pediatric Supracondylar Humerus Fractures.人工智能在解决小儿肱骨髁上骨折管理相关问题中的表现
J Pediatr Soc North Am. 2025 Mar 9;11:100164. doi: 10.1016/j.jposna.2025.100164. eCollection 2025 May.
8
ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.ChatGPT 联合 GPT-4 在诊断准确率上优于急诊科医生:回顾性分析。
J Med Internet Res. 2024 Jul 8;26:e56110. doi: 10.2196/56110.
9
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.评估大语言模型为中国重症肌无力性眼病患者提供患者教育的有效性:混合方法研究
J Med Internet Res. 2025 Apr 10;27:e67883. doi: 10.2196/67883.
10
Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.评估基于大语言模型的生成式人工智能工具在急诊分诊中的应用:ChatGPT Plus、Copilot Pro与分诊护士的对比研究
Am J Emerg Med. 2025 Mar;89:174-181. doi: 10.1016/j.ajem.2024.12.024. Epub 2024 Dec 19.

本文引用的文献

1
A Holistic Approach to Implementing Artificial Intelligence in Lung Cancer.肺癌中实施人工智能的整体方法。
Indian J Surg Oncol. 2025 Feb;16(1):257-278. doi: 10.1007/s13193-024-02079-6. Epub 2024 Sep 5.
2
Diagnostic Accuracy of ChatGPT for Patients' Triage; a Systematic Review and Meta-Analysis.ChatGPT对患者分诊的诊断准确性:一项系统评价与荟萃分析
Arch Acad Emerg Med. 2024 Jul 30;12(1):e60. doi: 10.22037/aaem.v12i1.2384. eCollection 2024.
3
ChatGPT With GPT-4 Outperforms Emergency Department Physicians in Diagnostic Accuracy: Retrospective Analysis.
ChatGPT 联合 GPT-4 在诊断准确率上优于急诊科医生:回顾性分析。
J Med Internet Res. 2024 Jul 8;26:e56110. doi: 10.2196/56110.
4
The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain.大型语言模型ChatGPT-4对因各种原因导致膝关节疼痛的患者表现出出色的分诊能力和诊断性能。
Arthroscopy. 2025 May;41(5):1438-1447.e14. doi: 10.1016/j.arthro.2024.06.021. Epub 2024 Jun 24.
5
Large Language Models in Orthopaedics: Definitions, Uses, and Limitations.骨科中的大语言模型:定义、用途及局限性
J Bone Joint Surg Am. 2024 Aug 7;106(15):1411-1418. doi: 10.2106/JBJS.23.01417. Epub 2024 Jun 19.
6
Enhancing the Diagnosis of Lateral Ankle Sprains: The Role of MSK Diagnostic Ultrasound in Evaluating ATFL and CFL.提高外侧踝关节扭伤的诊断水平:肌肉骨骼诊断超声在评估距腓前韧带和跟腓韧带中的作用。
Int J Sports Phys Ther. 2024 Feb 1;19(2):245-249. doi: 10.26603/001c.92232. eCollection 2024.
7
The Role of ChatGPT in the Advancement of Diagnosis, Management, and Prognosis of Cardiovascular and Cerebrovascular Disease.ChatGPT在心血管和脑血管疾病诊断、管理及预后评估中的作用
Healthcare (Basel). 2023 Nov 6;11(21):2906. doi: 10.3390/healthcare11212906.
8
Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial.医学专业人员的新兴技能:提示工程教程
J Med Internet Res. 2023 Oct 4;25:e50638. doi: 10.2196/50638.
9
Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study.Ada 健康和 WebMD 症状检查器、ChatGPT 和医生对急诊科患者的诊断和分诊准确性比较:临床数据分析研究。
JMIR Mhealth Uhealth. 2023 Oct 3;11:e49995. doi: 10.2196/49995.
10
Revolutionizing healthcare: the role of artificial intelligence in clinical practice.人工智能在临床实践中的应用:医疗保健的革命。
BMC Med Educ. 2023 Sep 22;23(1):689. doi: 10.1186/s12909-023-04698-z.