Suppr超能文献

评估基于大语言模型的生成式人工智能工具在急诊分诊中的应用:ChatGPT Plus、Copilot Pro与分诊护士的对比研究

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.

作者信息

Arslan B, Nuhoglu C, Satici M O, Altinbilek E

机构信息

Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

出版信息

Am J Emerg Med. 2025 Mar;89:174-181. doi: 10.1016/j.ajem.2024.12.024. Epub 2024 Dec 19.

Abstract

BACKGROUND

The number of emergency department (ED) visits has been on steady increase globally. Artificial Intelligence (AI) technologies, including Large Language Model (LLMs)-based generative AI models, have shown promise in improving triage accuracy. This study evaluates the performance of ChatGPT and Copilot in triage at a high-volume urban hospital, hypothesizing that these tools can match trained physicians' accuracy and reduce human bias amidst ED crowding challenges.

METHODS

This single-center, prospective observational study was conducted in an urban ED over one week. Adult patients were enrolled through random 24-h intervals. Exclusions included minors, trauma cases, and incomplete data. Triage nurses assessed patients while an emergency medicine (EM) physician documented clinical vignettes and assigned emergency severity index (ESI) levels. These vignettes were then introduced to ChatGPT and Copilot for comparison with the triage nurse's decision.

RESULTS

The overall triage accuracy was 65.2 % for nurses, 66.5 % for ChatGPT, and 61.8 % for Copilot, with no significant difference (p = 0.000). Moderate agreement was observed between the EM physician and ChatGPT, triage nurses, and Copilot (Cohen's Kappa = 0.537, 0.477, and 0.472, respectively). In recognizing high-acuity patients, ChatGPT and Copilot outperformed triage nurses (87.8 % and 85.7 % versus 32.7 %, respectively). Compared to ChatGPT and Copilot, nurses significantly under-triaged patients (p < 0.05). The analysis of predictive performance for ChatGPT, Copilot, and triage nurses demonstrated varying discrimination abilities across ESI levels, all of which were statistically significant (p < 0.05). ChatGPT and Copilot exhibited consistent accuracy across age, gender, and admission time, whereas triage nurses were more likely to mistriage patients under 45 years old.

CONCLUSION

ChatGPT and Copilot outperform traditional nurse triage in identifying high-acuity patients, but real-time ED capacity data is crucial to prevent overcrowding and ensure high-quality of emergency care.

摘要

背景

全球急诊科就诊人数一直在稳步增加。包括基于大语言模型(LLMs)的生成式人工智能模型在内的人工智能(AI)技术,在提高分诊准确性方面显示出了前景。本研究评估了ChatGPT和Copilot在一家大型城市医院分诊中的表现,假设这些工具能够达到训练有素的医生的准确性,并在急诊科拥挤的挑战中减少人为偏差。

方法

这项单中心前瞻性观察性研究在一家城市急诊科进行,为期一周。成年患者通过随机的24小时时间段纳入。排除标准包括未成年人、创伤病例和数据不完整的情况。分诊护士对患者进行评估,同时一名急诊医学(EM)医生记录临床病例并指定急诊严重程度指数(ESI)级别。然后将这些病例介绍给ChatGPT和Copilot,以与分诊护士的决策进行比较。

结果

护士的总体分诊准确率为65.2%,ChatGPT为66.5%,Copilot为61.8%,差异无统计学意义(p = 0.000)。急诊医学医生与ChatGPT、分诊护士和Copilot之间观察到中度一致性(Cohen's Kappa分别为0.537、0.477和0.472)。在识别高 acuity 患者方面,ChatGPT和Copilot的表现优于分诊护士(分别为87.8%和85.7%,而分诊护士为32.7%)。与ChatGPT和Copilot相比,护士对患者的分诊明显不足(p < 0.05)。对ChatGPT、Copilot和分诊护士预测性能的分析表明,在不同的ESI级别上,三者的辨别能力各不相同,且均具有统计学意义(p < 0.05)。ChatGPT和Copilot在年龄、性别和入院时间方面表现出一致的准确性,而分诊护士更有可能对45岁以下的患者进行错误分诊。

结论

ChatGPT和Copilot在识别高 acuity 患者方面优于传统的护士分诊,但实时的急诊科容量数据对于防止过度拥挤和确保高质量的急诊护理至关重要。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验