Shekhar Aditya C, Kimbrell Joshua, Saharan Aaryan, Stebel Jacob, Ashley Evan, Abbott Ethan E
Icahn School of Medicine at Mount Sinai, New York City, NY, United States of America.
Department of Pre-Hospital Care, Jamaica Hospital Medical Center, New York City, NY, United States of America.
Am J Emerg Med. 2025 Mar;89:27-29. doi: 10.1016/j.ajem.2024.12.032. Epub 2024 Dec 11.
Large language models (LLMs) have grown in popularity in recent months and have demonstrated advanced clinical reasoning ability. Given the need to prioritize the sickest patients requesting emergency medical services (EMS), we attempted to identify if an LLM could accurately triage ambulance requests using real-world data from a major metropolitan area.
An LLM (ChatGPT 4o Mini, Open AI, San Francisco, CA, USA) with no prior task-specific training was given real ambulance requests from a major metropolitan city in the United States. Requests were batched into groups of four, and the LLM was prompted to identify which of the four patients should be prioritized. The same groupings of four requests were then shown to a panel of experienced critical care paramedics who voted on which patient should be prioritized.
Across 98 groupings of four ambulance requests (392 total requests), the LLM agreed with the paramedic panel in most cases (76.5 %, n = 75). In groupings where the paramedic panel was unanimous in their decision (n = 48), the LLM agreed with the unanimous panel in 93.8 % of groupings (n = 45).
Our preliminary analysis indicates LLMs may have the potential to become a useful tool for triage and resource allocation in emergency care settings, especially in cases where there is consensus among subject matter experts. Further research is needed to better understand and clarify how they may best be of service.
大型语言模型(LLMs)在最近几个月越来越受欢迎,并已展现出先进的临床推理能力。鉴于需要优先处理请求紧急医疗服务(EMS)的病情最严重的患者,我们试图确定一个大型语言模型是否能够使用来自一个主要大都市地区的真实世界数据准确地对救护车请求进行分诊。
一个未经事先特定任务训练的大型语言模型(ChatGPT 4o Mini,OpenAI,美国加利福尼亚州旧金山)被给予来自美国一个主要大都市的真实救护车请求。请求被分成每组四个,然后提示大型语言模型确定这四个患者中哪一个应被优先处理。然后将同样的四个请求分组展示给一组经验丰富的重症护理护理人员,他们投票决定哪一个患者应被优先处理。
在98组四个救护车请求(共392个请求)中,大型语言模型在大多数情况下(76.5%,n = 75)与护理人员小组意见一致。在护理人员小组意见一致的分组中(n = 48),大型语言模型在93.8%的分组中(n = 45)与一致意见的小组意见一致。
我们的初步分析表明,大型语言模型可能有潜力成为紧急护理环境中分诊和资源分配的有用工具,特别是在主题专家之间存在共识的情况下。需要进一步研究以更好地理解和阐明它们如何能得到最佳服务。