Zhang Chi, Yang Hao, Liu Xingyun, Wu Rongrong, Zong Hui, Wu Erman, Zhou Yi, Li Jiakun, Shen Bairong
Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China.
Information Center and Department of Critical Care Medicine and Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu, China.
J Med Internet Res. 2025 Jun 6;27:e67201. doi: 10.2196/67201.
Sepsis is a severe syndrome of organ dysfunction caused by infection; it has high heterogeneity and high in-hospital mortality, representing a grim clinical challenge for precision medicine in critical care.
We aimed to extract reported sepsis biomarkers to provide users with comprehensive biomedical information and integrate retrieval augmented generation (RAG) and prompt engineering to enhance the accuracy, stability, and interpretability of clinical decisions recommended by large language models (LLMs).
To address the challenge, we established and updated the first knowledge-enhanced platform, MetaSepsisKnowHub, comprising 427 sepsis biomarkers and 423 studies, aiming to systematically collect and annotate sepsis biomarkers to guide personalized clinical decision-making in the diagnosis and treatment of human sepsis. We curated a tailored LLM framework incorporating RAG and prompt engineering and incorporated 2 performance evaluation scales: the System Usability Scale and the Net Promoter Score.
The overall quantitative ratings of expert-reviewed clinical recommendations based on RAG surpassed baseline responses generated by 4 LLMs and showed a statistically significant improvement in textual questions (GPT-4: mean 75.79, SD 7.11 vs mean 81.59, SD 9.87; P=.02; GPT-4o: mean 70.36, SD 7.63 vs mean 77.98, SD 13.26; P=.02; Qwen2.5-instruct: mean 77.08 SD 3.75 vs mean 85.46, SD 7.27; P<.001; and DeepSeek-R1: mean 77.67, SD 3.66 vs mean 86.42, SD 8.56; P<.001), but no significant statistical differences could be measured in clinical scenarios. The RAG assessment score comparing RAG-based responses and expert-provided benchmark answers illustrated prominent factual correctness, accuracy, and knowledge recall compared to the baseline responses. After use, the average the System Usability Scale score was 82.20 (SD 14.17) and the Net Promoter Score was 72, demonstrating high user satisfaction and loyalty.
We highlight the pioneering MetaSepsisKnowHub platform, and we show that combining MetaSepsisKnowHub with RAG can minimize limitations on precision and maximize the breadth of LLMs to shorten the bench-to-bedside distance, serving as a knowledge-enhanced paradigm for future application of artificial intelligence in critical care medicine.
脓毒症是一种由感染引起的严重器官功能障碍综合征;它具有高度异质性和高住院死亡率,对重症监护中的精准医学构成严峻的临床挑战。
我们旨在提取已报道的脓毒症生物标志物,为用户提供全面的生物医学信息,并整合检索增强生成(RAG)和提示工程,以提高大语言模型(LLM)推荐的临床决策的准确性、稳定性和可解释性。
为应对这一挑战,我们建立并更新了首个知识增强平台MetaSepsisKnowHub,该平台包含427种脓毒症生物标志物和423项研究,旨在系统收集和注释脓毒症生物标志物,以指导人类脓毒症诊断和治疗中的个性化临床决策。我们精心策划了一个结合RAG和提示工程的定制LLM框架,并纳入了2个性能评估量表:系统可用性量表和净推荐值。
基于RAG的专家评审临床建议的总体定量评分超过了4个LLM生成的基线回答,并且在文本问题上显示出统计学上的显著改善(GPT-4:平均值75.79,标准差7.1与平均值81.59,标准差9.87;P = 0.02;GPT-4o:平均值70.36,标准差7.63与平均值77.98,标准差13.26;P = 0.02;Qwen2.5-instruct:平均值77.08,标准差3.75与平均值85.46,标准差7.27;P < 0.001;以及DeepSeek-R1:平均值77.67,标准差3.66与平均值86.42,标准差8.56;P < 0.001),但在临床场景中未测得显著的统计学差异。与基线回答相比,基于RAG的回答与专家提供的基准答案的RAG评估分数显示出显著的事实正确性、准确性和知识召回率。使用后,系统可用性量表的平均得分为82.20(标准差14.17),净推荐值为72,表明用户满意度和忠诚度较高。
我们强调了开创性的MetaSepsisKnowHub平台,并且表明将MetaSepsisKnowHub与RAG相结合可以最大限度地减少对精准度的限制,并最大限度地扩大LLM的广度,以缩短从实验室到病床的距离,作为人工智能在重症监护医学未来应用的知识增强范例。