使用大语言模型和基于知识图谱的检索增强生成技术检测患者门户消息中的紧急情况。

Detecting emergencies in patient portal messages using large language models and knowledge graph-based retrieval-augmented generation.

作者信息

Liu Siru, Wright Aileen P, McCoy Allison B, Huang Sean S, Steitz Bryan, Wright Adam

机构信息

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37212, United States.

Department of Computer Science, Vanderbilt University, Nashville, TN 37240, United States.

出版信息

J Am Med Inform Assoc. 2025 Jun 1;32(6):1032-1039. doi: 10.1093/jamia/ocaf059.

DOI:10.1093/jamia/ocaf059

PMID:40220286

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12089757/

Abstract

OBJECTIVES

This study aims to develop and evaluate an approach using large language models (LLMs) and a knowledge graph to triage patient messages that need emergency care. The goal is to notify patients when their messages indicate an emergency, guiding them to seek immediate help rather than using the patient portal, to improve patient safety.

MATERIALS AND METHODS

We selected 1020 messages sent to Vanderbilt University Medical Center providers between January 1, 2022 and March 7, 2023. We developed four models to triage these messages for emergencies: (1) Prompt-Only: the patient message was input with a prompt directly into the LLM; (2) Naïve Retrieval Augmented Generation (RAG): provided retrieved information as context to the LLM; (3) RAG from Knowledge Graph with Local Search: a knowledge graph was used to retrieve locally relevant information based on semantic similarities; (4) RAG from Knowledge Graph with Global Search: a knowledge graph was used to retrieve globally relevant information through hierarchical community detection. The knowledge base was a triage book covering 225 protocols.

RESULTS

The RAG from Knowledge Graph model with global search outperformed other models, achieving an accuracy of 0.99, a sensitivity of 0.98, and a specificity of 0.99. It demonstrated significant improvements in triaging emergency messages compared to LLM without RAG and naïve RAG.

DISCUSSION

The traditional LLM without any retrieval mechanism underperformed compared to models with RAG, which aligns with the expected benefits of augmenting LLMs with domain-specific knowledge sources. Our results suggest that providing external knowledge, especially in a structured manner and in community summaries, can improve LLM performance in triaging patient portal messages.

CONCLUSION

LLMs can effectively assist in triaging emergency patient messages after integrating with a knowledge graph about a nurse triage book. Future research should focus on expanding the knowledge graph and deploying the system to evaluate its impact on patient outcomes.

摘要

目的

本研究旨在开发并评估一种使用大语言模型（LLMs）和知识图谱对需要紧急护理的患者信息进行分诊的方法。目标是当患者信息表明存在紧急情况时通知患者，引导他们立即寻求帮助而非使用患者门户网站，以提高患者安全性。

材料与方法

我们选取了2022年1月1日至2023年3月7日期间发送给范德比尔特大学医学中心医护人员的1020条信息。我们开发了四种模型来对这些信息进行紧急情况分诊：（1）仅提示：将患者信息与提示一起直接输入大语言模型；（2）朴素检索增强生成（RAG）：将检索到的信息作为上下文提供给大语言模型；（3）基于局部搜索的知识图谱RAG：使用知识图谱基于语义相似性检索局部相关信息；（4）基于全局搜索的知识图谱RAG：通过分层社区检测使用知识图谱检索全局相关信息。知识库是一本涵盖225种规程的分诊手册。

结果

基于全局搜索的知识图谱RAG模型优于其他模型，准确率达到0.99，灵敏度为0.98，特异性为0.99。与没有RAG的大语言模型和朴素RAG相比，它在分诊紧急信息方面有显著改进。

讨论

与具有RAG的模型相比，没有任何检索机制的传统大语言模型表现较差，这与用特定领域知识源增强大语言模型的预期益处相符。我们的结果表明，提供外部知识，尤其是以结构化方式和在社区摘要中提供，可以提高大语言模型在分诊患者门户网站信息方面的性能。

结论

大语言模型在与关于护士分诊手册的知识图谱整合后，可以有效地协助分诊紧急患者信息。未来的研究应侧重于扩展知识图谱并部署该系统以评估其对患者结局的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bc5/12089757/1fbc988e08e6/ocaf059f1.jpg

相似文献

Detecting emergencies in patient portal messages using large language models and knowledge graph-based retrieval-augmented generation.使用大语言模型和基于知识图谱的检索增强生成技术检测患者门户消息中的紧急情况。

J Am Med Inform Assoc. 2025 Jun 1;32(6):1032-1039. doi: 10.1093/jamia/ocaf059.

Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models.改善膳食补充剂信息检索：利用大语言模型开发检索增强生成系统

J Med Internet Res. 2025 Mar 19;27:e67677. doi: 10.2196/67677.

Improving large language model applications in biomedicine with retrieval-augmented generation: a systematic review, meta-analysis, and clinical development guidelines.利用检索增强生成改进生物医学中的大语言模型应用：一项系统综述、荟萃分析和临床开发指南

J Am Med Inform Assoc. 2025 Apr 1;32(4):605-615. doi: 10.1093/jamia/ocaf008.

Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study.使用检索增强大语言模型进行COVID-19事实核查：开发与可用性研究。

J Med Internet Res. 2025 Apr 30;27:e66098. doi: 10.2196/66098.

Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE.语义临床人工智能与原生大语言模型在美国医师执照考试中的表现对比

JAMA Netw Open. 2025 Apr 1;8(4):e256359. doi: 10.1001/jamanetworkopen.2025.6359.

Enhancing Large Language Models with Retrieval-Augmented Generation: A Radiology-Specific Approach.通过检索增强生成来提升大语言模型：一种特定于放射学的方法。

Radiol Artif Intell. 2025 May;7(3):e240313. doi: 10.1148/ryai.240313.

Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought.通过生成增强检索和分层思维链赋能大型语言模型进行自动化临床评估。

Artif Intell Med. 2025 Apr;162:103078. doi: 10.1016/j.artmed.2025.103078. Epub 2025 Feb 12.

Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence-Based Medicine.定制大语言模型提高准确性：将检索增强生成和人工智能代理与非定制模型在循证医学方面进行比较

Arthroscopy. 2025 Mar;41(3):565-573.e6. doi: 10.1016/j.arthro.2024.10.042. Epub 2024 Nov 7.

Optimizing theranostics chatbots with context-augmented large language models.利用上下文增强大语言模型优化治疗诊断聊天机器人。

Theranostics. 2025 Apr 21;15(12):5693-5704. doi: 10.7150/thno.107757. eCollection 2025.

Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.评估和增强用于遗传咨询支持的日本大语言模型：领域适应的比较研究与专家评估数据集的开发

JMIR Med Inform. 2025 Jan 16;13:e65047. doi: 10.2196/65047.

引用本文的文献

Transforming Cardio-Oncology Care Through AI-Driven Large Language Model Systems: A Roadmap for Future Implementation.通过人工智能驱动的大语言模型系统变革心血管肿瘤护理：未来实施路线图。

JACC Adv. 2025 Aug 29;4(10 Pt 2):102117. doi: 10.1016/j.jacadv.2025.102117.

Harnessing the power of large language models for clinical tasks and synthesis of scientific literature.利用大语言模型的能力来完成临床任务和综合科学文献。

J Am Med Inform Assoc. 2025 Jun 1;32(6):983-984. doi: 10.1093/jamia/ocaf071.

Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.在卫生经济学与结果研究中使用生成式人工智能：技术与突破入门

Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.

本文引用的文献

J Am Med Inform Assoc. 2025 Apr 1;32(4):605-615. doi: 10.1093/jamia/ocaf008.

Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions.通过迭代后续问题改进医学领域的检索增强生成

Pac Symp Biocomput. 2025;30:199-214. doi: 10.1142/9789819807024_0015.

The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.GPT-3 人工智能模型的诊断和分诊准确性：一项观察性研究。

Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.医学领域多模态GPT-4视觉专家级准确性背后的隐藏缺陷。

NPJ Digit Med. 2024 Jul 23;7(1):190. doi: 10.1038/s41746-024-01185-7.

Emergency Patient Triage Improvement through a Retrieval-Augmented Generation Enhanced Large-Scale Language Model.通过检索增强生成改进的大规模语言模型实现急诊患者分诊优化

Prehosp Emerg Care. 2025;29(3):203-209. doi: 10.1080/10903127.2024.2374400. Epub 2024 Jul 11.

Using large language model to guide patients to create efficient and comprehensive clinical care message.利用大型语言模型指导患者创建高效、全面的临床护理信息。

J Am Med Inform Assoc. 2024 Aug 1;31(8):1665-1670. doi: 10.1093/jamia/ocae142.

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较：大型语言模型、ChatGPT 和未经训练的急诊医生：一项对比研究。

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

Leveraging large language models for generating responses to patient messages-a subjective analysis.利用大型语言模型生成对患者信息的回复——主观分析。

J Am Med Inform Assoc. 2024 May 20;31(6):1367-1379. doi: 10.1093/jamia/ocae052.

Evaluation metrics and statistical tests for machine learning.机器学习的评估指标和统计检验。

Sci Rep. 2024 Mar 13;14(1):6086. doi: 10.1038/s41598-024-56706-x.

Why do users override alerts? Utilizing large language model to summarize comments and optimize clinical decision support.用户为什么会忽略警报？利用大语言模型总结评论并优化临床决策支持。

J Am Med Inform Assoc. 2024 May 20;31(6):1388-1396. doi: 10.1093/jamia/ocae041.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用大语言模型和基于知识图谱的检索增强生成技术检测患者门户消息中的紧急情况。

Detecting emergencies in patient portal messages using large language models and knowledge graph-based retrieval-augmented generation.

作者信息

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献