• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用检索增强大语言模型预测术后30天死亡率和美国麻醉医师协会身体状况:开发与验证研究

Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.

作者信息

Chen Ying-Hao, Ruan Shanq-Jang, Chen Pei-Fu

机构信息

Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan.

Department of Anesthesiology, Far Eastern Memorial Hospital, New Taipei City, Taiwan.

出版信息

J Med Internet Res. 2025 Jun 3;27:e75052. doi: 10.2196/75052.

DOI:10.2196/75052
PMID:40460423
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12174870/
Abstract

BACKGROUND

Accurately assessing perioperative risk is critical for informed surgical planning and patient safety. However, current prediction models often rely on structured data and overlook the nuanced clinical reasoning embedded in free-text preoperative notes. Recent advances in large language models (LLMs) have opened opportunities for harnessing unstructured clinical data, yet their application in perioperative prediction remains limited by concerns about factual accuracy. Retrieval-augmented generation (RAG) offers a promising solution-enhancing LLM performance by grounding outputs in domain-specific knowledge sources, potentially improving both predictive accuracy and clinical interpretability.

OBJECTIVE

This study aimed to investigate whether integrating LLMs with RAG can improve the prediction of 30-day postoperative mortality and American Society of Anesthesiologists (ASA) physical status classification using unstructured preoperative clinical notes.

METHODS

We conducted a retrospective cohort study using 24,491 medical records from a tertiary medical center, including preoperative anesthesia assessments, discharge summaries, and surgical information. To extract clinical insights from free-text data, we used the LLaMA 3.1-8B language model with RAG, using MedEmbed for text embedding and Miller's Anesthesia as the primary retrieval source. We evaluated model performance under various configurations, including embedding models, chunk sizes, and few-shot prompting. Machine learning (ML) models, including random forest, support vector machines (SVM), Extreme Gradient Boosting (XGBoost), and logistic regression, were trained on structured features as baselines.

RESULTS

A total of 520 (2.1%) patients experienced in-hospital 30-day postoperative mortality. The ASA physical status distribution was as follows: class I: 535 (2.2%); class II: 15,272 (62.4%); class III: 8024 (32.8%); class IV: 606 (2.5%); and class V: 54 (0.22%). For 30-day postoperative mortality prediction, the LLaMA‑RAG model achieved an F-score of 0.4663 (95% CI 0.4654-0.4672), versus 0.2369 (95% CI 0.2341-0.2397) without few‑shot prompting, 0.0879 (95% CI 0.0717-0.1041) without RAG, and 0.0436 (95% CI 0.0292-0.0580) without either few‑shot prompting or RAG. Among ML models, XGBoost scored 0.4459 (95% CI 0.4176-0.4742); random forest, 0.3953 (95% CI 0.3791-0.4115); logistic regression, 0.2720 (95% CI 0.2647-0.2793); and SVM, 0.2474 (95% CI 0.2275-0.2673). For ASA classification, LLaMA‑RAG achieved a micro F-score of 0.8409 (95% CI 0.8238-0.8551) versus 0.6546 (95% CI 0.6430-0.6796) without few-shot prompting, 0.6340 (95% CI 0.6157-0.6535) without RAG, and 0.4238 (95% CI 0.3952-0.4490) without either few‑shot prompting or RAG. In comparison, XGBoost achieved 0.8273 (95% CI 0.8209-0.8498); logistic regression, 0.7940 (95% CI 0.7671-0.7950); random forest, 0.7847 (95% CI 0.7637-0.7868); and SVM, 0.7697 (95% CI 0.7637-0.7697). Notably, the model demonstrated exceptional sensitivity in identifying rare but high-risk cases, such as ASA Class 5 patients and postoperative deaths.

CONCLUSIONS

The LLaMA-RAG model significantly improved the prediction of postoperative mortality and ASA classification, especially for rare high-risk cases. By grounding outputs in domain knowledge, retrieval-augmented generation enhanced both accuracy and prompt‑driven interpretability over ML and ablation models-highlighting its promise for real-world clinical decision support.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bfc/12174870/8164919e90c4/jmir_v27i1e75052_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bfc/12174870/9b7281cba3f2/jmir_v27i1e75052_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bfc/12174870/72e47a2a37ed/jmir_v27i1e75052_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bfc/12174870/8164919e90c4/jmir_v27i1e75052_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bfc/12174870/9b7281cba3f2/jmir_v27i1e75052_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bfc/12174870/72e47a2a37ed/jmir_v27i1e75052_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bfc/12174870/8164919e90c4/jmir_v27i1e75052_fig3.jpg
摘要

背景

准确评估围手术期风险对于明智的手术规划和患者安全至关重要。然而,当前的预测模型通常依赖结构化数据,而忽略了术前自由文本记录中蕴含的细微临床推理。大语言模型(LLMs)的最新进展为利用非结构化临床数据提供了机会,但其在围手术期预测中的应用仍因对事实准确性的担忧而受到限制。检索增强生成(RAG)提供了一个有前景的解决方案——通过将输出基于特定领域的知识源来提高大语言模型的性能,有可能提高预测准确性和临床可解释性。

目的

本研究旨在探讨将大语言模型与检索增强生成相结合是否能利用术前非结构化临床记录改善对术后30天死亡率和美国麻醉医师协会(ASA)身体状况分类的预测。

方法

我们进行了一项回顾性队列研究,使用了来自一家三级医疗中心的24491份病历,包括术前麻醉评估、出院小结和手术信息。为了从自由文本数据中提取临床见解,我们使用了带有检索增强生成的LLaMA 3.1 - 8B语言模型,使用MedEmbed进行文本嵌入,并将《米勒麻醉学》作为主要检索源。我们在各种配置下评估模型性能,包括嵌入模型、分块大小和少样本提示。机器学习(ML)模型,包括随机森林、支持向量机(SVM)、极端梯度提升(XGBoost)和逻辑回归,在结构化特征上进行训练作为基线。

结果

共有520名(2.1%)患者在术后30天内发生院内死亡。ASA身体状况分布如下:I级:535名(2.2%);II级:15272名(62.4%);III级:8024名(32.8%);IV级:606名(2.5%);V级:54名(0.22%)。对于术后30天死亡率预测,LLaMA - RAG模型的F值为0.4663(95%CI 0.4654 - 0.4672),相比之下,无少样本提示时为0.2369(95%CI 0.2341 - 0.2397),无检索增强生成时为0.0879(95%CI 0.0717 - 0.1041),无少样本提示和检索增强生成时为0.0436(95%CI 0.0292 - 0.0580)。在机器学习模型中,XGBoost得分为0.4459(95%CI 0.4176 - 0.4742);随机森林为0.3953(95%CI 0.3791 - 0.4115);逻辑回归为0.2720(95%CI 0.2647 - 0.2793);支持向量机为0.2474(95%CI 0.2275 - 0.2673)。对于ASA分类,LLaMA - RAG的微F值为0.8409(95%CI 0.8238 - 0.8551),相比之下,无少样本提示时为0.6546(95%CI 0.6430 - 0.6796),无检索增强生成时为0.6340(95%CI 0.6157 - 0.6535),无少样本提示和检索增强生成时为0.4238(95%CI 0.3952 - 0.4490)。相比之下,XGBoost为0.8273(95%CI 0.8209 - 0.8498);逻辑回归为0.7940(95%CI 0.7671 - 0.7950);随机森林为0.7847(95%CI 0.7637 - 0.7868);支持向量机为0.7697(95%CI 0.7637 - 0.7697)。值得注意的是,该模型在识别罕见但高风险的病例方面表现出出色的敏感性,如ASA 5级患者和术后死亡病例。

结论

LLaMA-RAG模型显著改善了术后死亡率预测和ASA分类,特别是对于罕见的高风险病例。通过将输出基于领域知识,检索增强生成提高了准确性和提示驱动的可解释性,超过了机器学习和消融模型,突出了其在现实世界临床决策支持中的前景。

相似文献

1
Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.使用检索增强大语言模型预测术后30天死亡率和美国麻醉医师协会身体状况:开发与验证研究
J Med Internet Res. 2025 Jun 3;27:e75052. doi: 10.2196/75052.
2
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
3
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
4
The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review.机器学习在疾病预测与管理中分析真实世界数据的应用:系统评价
JMIR Med Inform. 2025 Jun 19;13:e68898. doi: 10.2196/68898.
5
Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.使用大语言模型进行乳腺影像报告和数据系统分类及恶性肿瘤预测以增强乳腺超声诊断:回顾性研究
JMIR Med Inform. 2025 Jun 11;13:e70924. doi: 10.2196/70924.
6
Predicting Early-Onset Colorectal Cancer in Individuals Below Screening Age Using Machine Learning and Real-World Data: Case Control Study.利用机器学习和真实世界数据预测筛查年龄以下个体的早发性结直肠癌:病例对照研究
JMIR Cancer. 2025 Jun 19;11:e64506. doi: 10.2196/64506.
7
Chatbot for the Return of Positive Genetic Screening Results for Hereditary Cancer Syndromes: Prompt Engineering Project.遗传性癌症综合征阳性基因筛查结果返回的聊天机器人:提示工程设计项目
JMIR Cancer. 2025 Jun 10;11:e65848. doi: 10.2196/65848.
8
RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering.RadioRAG:用于放射学问答的在线检索增强生成
Radiol Artif Intell. 2025 Jun 18:e240476. doi: 10.1148/ryai.240476.
9
Sentiment Analysis Using a Large Language Model-Based Approach to Detect Opioids Mixed With Other Substances Via Social Media: Method Development and Validation.使用基于大语言模型的方法通过社交媒体检测与其他物质混合的阿片类药物的情感分析:方法开发与验证
JMIR Infodemiology. 2025 Jun 19;5:e70525. doi: 10.2196/70525.
10
Development of a Machine Learning-Based Predictive Model for Postoperative Delirium in Older Adult Intensive Care Unit Patients: Retrospective Study.基于机器学习的老年重症监护病房患者术后谵妄预测模型的开发:一项回顾性研究。
J Med Internet Res. 2025 Jun 19;27:e67258. doi: 10.2196/67258.

本文引用的文献

1
Clinical and Surgical Applications of Large Language Models: A Systematic Review.大语言模型的临床与外科应用:一项系统综述
J Clin Med. 2024 May 22;13(11):3041. doi: 10.3390/jcm13113041.
2
Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication.大语言模型在围手术期风险预测和预后中的应用。
JAMA Surg. 2024 Aug 1;159(8):928-937. doi: 10.1001/jamasurg.2024.1621.
3
Preoperative investigation practices for elective surgical patients: clinical audit.择期手术患者术前检查实践:临床审计。
BMC Anesthesiol. 2024 May 23;24(1):184. doi: 10.1186/s12871-024-02557-y.
4
Almanac - Retrieval-Augmented Language Models for Clinical Medicine.用于临床医学的年鉴检索增强语言模型。
NEJM AI. 2024 Feb;1(2). doi: 10.1056/aioa2300068. Epub 2024 Jan 25.
5
DRG-LLaMA : tuning LLaMA model to predict diagnosis-related group for hospitalized patients.DRG-LLaMA:调整LLaMA模型以预测住院患者的诊断相关分组
NPJ Digit Med. 2024 Jan 22;7(1):16. doi: 10.1038/s41746-023-00989-3.
6
Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach.从 COVID-19 临床病例报告中提取实体和关系:一种自然语言处理方法。
BMC Med Inform Decis Mak. 2023 Jan 26;23(1):20. doi: 10.1186/s12911-023-02117-3.
7
Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative Complications and Report on a Mobile Platform.基于电子健康记录数据的机器学习算法预测术后并发症的性能及移动平台报告。
JAMA Netw Open. 2022 May 2;5(5):e2211973. doi: 10.1001/jamanetworkopen.2022.11973.
8
Predicting Postoperative Mortality With Deep Neural Networks and Natural Language Processing: Model Development and Validation.使用深度神经网络和自然语言处理预测术后死亡率:模型开发与验证
JMIR Med Inform. 2022 May 10;10(5):e38241. doi: 10.2196/38241.
9
Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction.探讨改进 BERT 模型在生物医学关系抽取中的预训练和微调。
BMC Bioinformatics. 2022 Apr 4;23(1):120. doi: 10.1186/s12859-022-04642-w.
10
The Evolution, Current Value, and Future of the American Society of Anesthesiologists Physical Status Classification System.美国麻醉医师协会体格状况分类系统的演变、现状和未来。
Anesthesiology. 2021 Nov 1;135(5):904-919. doi: 10.1097/ALN.0000000000003947.