Suppr超能文献

基于医院健康数据评估大型语言模型以实现自动急诊分诊。

Evaluating large language models on hospital health data for automated emergency triage.

作者信息

Lafuente Carlos, Rahim Mehdi

机构信息

DSIC, Universitat Politècnica de València, Valencia, Spain.

R&D, Air Liquide, Les Loges-en-Josas, France.

出版信息

Int J Comput Assist Radiol Surg. 2025 Jul 16. doi: 10.1007/s11548-025-03475-1.

Abstract

PURPOSE

Large language models (LLMs) have a significant potential in healthcare due to their ability to process unstructured text from electronic health records (EHRs) and to generate knowledge with few or no training. In this study, we investigate the effectiveness of LLMs for clinical decision support, specifically in the context of emergency department triage, where the volume of textual data is minimal compared to other scenarios such as making a clinical diagnosis.

METHODS

We benchmark LLMs with traditional machine learning (ML) approaches using the Emergency Severity Index (ESI) as the gold standard criteria of triage. The benchmark includes general purpose, specialised, and fine-tuned LLMs. All models are prompted to predict ESI score from a EHRs. We use a balanced subset (n = 1000) from MIMIC-IV-ED, a large database containing records of admissions to the emergency department of Beth Israel Deaconess Medical Center.

RESULTS

Our findings show that the best-performing models have an average F1-score below 0.60. Also, while zero-shot and fine-tuned LLMs can outperform standard ML models, their performance is surpassed by ML models augmented with features derived from LLMs or knowledge graphs.

CONCLUSION

LLMs show value for clinical decision support in scenarios with limited textual data, such as emergency department triage. The study advocates for integrating LLM knowledge representation to improve existing ML models rather than using LLMs in isolation, suggesting this as a more promising approach to enhance the accuracy of automated triage systems.

摘要

目的

大语言模型(LLMs)在医疗保健领域具有巨大潜力,因为它们能够处理电子健康记录(EHRs)中的非结构化文本,并在很少或没有训练的情况下生成知识。在本研究中,我们调查了大语言模型在临床决策支持方面的有效性,特别是在急诊科分诊的背景下,与诸如进行临床诊断等其他场景相比,这里的文本数据量最少。

方法

我们使用急诊严重程度指数(ESI)作为分诊的金标准,将大语言模型与传统机器学习(ML)方法进行基准测试。该基准测试包括通用、专门和微调的大语言模型。所有模型都被要求根据电子健康记录预测ESI评分。我们使用了MIMIC-IV-ED中的一个平衡子集(n = 1000),MIMIC-IV-ED是一个包含贝斯以色列女执事医疗中心急诊科入院记录的大型数据库。

结果

我们的研究结果表明,表现最佳的模型平均F1分数低于0.60。此外,虽然零样本和微调的大语言模型可以优于标准的机器学习模型,但它们的性能被通过从大语言模型或知识图谱派生的特征增强的机器学习模型超越。

结论

大语言模型在文本数据有限的场景中,如急诊科分诊,显示出临床决策支持的价值。该研究主张整合大语言模型的知识表示以改进现有的机器学习模型,而不是单独使用大语言模型,这表明这是提高自动分诊系统准确性的更有前景的方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验