大语言模型在围手术期风险预测和预后中的应用。

Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication.

机构信息

Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University, Stanford, California.

Department of Anesthesiology & Pain Medicine, University of Washington, Seattle.

出版信息

JAMA Surg. 2024 Aug 1;159(8):928-937. doi: 10.1001/jamasurg.2024.1621.

DOI:10.1001/jamasurg.2024.1621

PMID:38837145

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11154375/

Abstract

IMPORTANCE

General-domain large language models may be able to perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient's electronic health record notes.

OBJECTIVE

To examine predictive performance on 8 different tasks: prediction of American Society of Anesthesiologists Physical Status (ASA-PS), hospital admission, intensive care unit (ICU) admission, unplanned admission, hospital mortality, postanesthesia care unit (PACU) phase 1 duration, hospital duration, and ICU duration.

DESIGN, SETTING, AND PARTICIPANTS: This prognostic study included task-specific datasets constructed from 2 years of retrospective electronic health records data collected during routine clinical care. Case and note data were formatted into prompts and given to the large language model GPT-4 Turbo (OpenAI) to generate a prediction and explanation. The setting included a quaternary care center comprising 3 academic hospitals and affiliated clinics in a single metropolitan area. Patients who had a surgery or procedure with anesthesia and at least 1 clinician-written note filed in the electronic health record before surgery were included in the study. Data were analyzed from November to December 2023.

EXPOSURES

Compared original notes, note summaries, few-shot prompting, and chain-of-thought prompting strategies.

MAIN OUTCOMES AND MEASURES

F1 score for binary and categorical outcomes. Mean absolute error for numerical duration outcomes.

RESULTS

Study results were measured on task-specific datasets, each with 1000 cases with the exception of unplanned admission, which had 949 cases, and hospital mortality, which had 576 cases. The best results for each task included an F1 score of 0.50 (95% CI, 0.47-0.53) for ASA-PS, 0.64 (95% CI, 0.61-0.67) for hospital admission, 0.81 (95% CI, 0.78-0.83) for ICU admission, 0.61 (95% CI, 0.58-0.64) for unplanned admission, and 0.86 (95% CI, 0.83-0.89) for hospital mortality prediction. Performance on duration prediction tasks was universally poor across all prompt strategies for which the large language model achieved a mean absolute error of 49 minutes (95% CI, 46-51 minutes) for PACU phase 1 duration, 4.5 days (95% CI, 4.2-5.0 days) for hospital duration, and 1.1 days (95% CI, 0.9-1.3 days) for ICU duration prediction.

CONCLUSIONS AND RELEVANCE

Current general-domain large language models may assist clinicians in perioperative risk stratification on classification tasks but are inadequate for numerical duration predictions. Their ability to produce high-quality natural language explanations for the predictions may make them useful tools in clinical workflows and may be complementary to traditional risk prediction models.

摘要

重要性

通用领域的大型语言模型可能能够通过描述手术过程和患者的电子健康记录笔记，对风险分层和预测术后结果进行评估。

目的

检验在 8 项不同任务上的预测性能：预测美国麻醉医师协会身体状况评分（ASA-PS）、住院、重症监护病房（ICU）入院、非计划性入院、住院死亡率、麻醉后护理单元（PACU）第 1 阶段持续时间、住院时间和 ICU 持续时间。

设计、设置和参与者：这项预后研究包括从 2 年的回顾性电子健康记录数据中构建的特定于任务的数据集，这些数据是在常规临床护理期间收集的。病例和笔记数据被格式化为提示，并提供给大型语言模型 GPT-4 Turbo（OpenAI）以生成预测和解释。该设置包括一个由 3 家学术医院和一个大都市地区的附属诊所组成的四级护理中心。研究包括接受了手术或有麻醉的患者，并且在手术前至少有 1 位临床医生书写的记录在电子健康记录中的患者。数据分析于 2023 年 11 月至 12 月进行。

暴露情况

与原始笔记、笔记摘要、少量提示和链式思维提示策略进行比较。

主要结果和措施

二元和分类结果的 F1 评分。数值持续时间结果的平均绝对误差。

结果

研究结果是在特定于任务的数据集上进行测量的，每个数据集有 1000 个病例，除了非计划性入院有 949 个病例，以及住院死亡率有 576 个病例。每个任务的最佳结果包括 ASA-PS 的 F1 评分为 0.50（95%CI，0.47-0.53），住院的 F1 评分为 0.64（95%CI，0.61-0.67），ICU 入院的 F1 评分为 0.81（95%CI，0.78-0.83），非计划性入院的 F1 评分为 0.61（95%CI，0.58-0.64），以及住院死亡率预测的 F1 评分为 0.86（95%CI，0.83-0.89）。对于所有提示策略，语言模型在持续时间预测任务上的性能普遍较差，对于 PACU 第 1 阶段持续时间预测，语言模型的平均绝对误差为 49 分钟（95%CI，46-51 分钟），对于住院持续时间预测为 4.5 天（95%CI，4.2-5.0 天），对于 ICU 持续时间预测为 1.1 天（95%CI，0.9-1.3 天）。

结论和相关性

当前的通用领域大型语言模型可能有助于临床医生进行围手术期风险分层的分类任务，但对于数值持续时间预测则不足。它们能够为预测生成高质量的自然语言解释，这可能使它们成为临床工作流程中的有用工具，并且可能与传统的风险预测模型互补。

相似文献

Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication.大语言模型在围手术期风险预测和预后中的应用。

JAMA Surg. 2024 Aug 1;159(8):928-937. doi: 10.1001/jamasurg.2024.1621.

Prediction of American Society of Anesthesiologists Physical Status Classification from preoperative clinical text narratives using natural language processing.使用自然语言处理技术从术前临床文本叙述中预测美国麻醉医师协会身体状况分类。

BMC Anesthesiol. 2023 Sep 4;23(1):296. doi: 10.1186/s12871-023-02248-0.

Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.使用检索增强大语言模型预测术后30天死亡率和美国麻醉医师协会身体状况：开发与验证研究

J Med Internet Res. 2025 Jun 3;27:e75052. doi: 10.2196/75052.

Development and Validation of a Dynamic Real-Time Risk Prediction Model for Intensive Care Units Patients Based on Longitudinal Irregular Data: Multicenter Retrospective Study.基于纵向不规则数据的重症监护病房患者动态实时风险预测模型的开发与验证：多中心回顾性研究

J Med Internet Res. 2025 Apr 23;27:e69293. doi: 10.2196/69293.

Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data.使用电子健康记录数据的自然语言处理验证危重病预后预测模型。

JAMA Netw Open. 2018 Dec 7;1(8):e185097. doi: 10.1001/jamanetworkopen.2018.5097.

Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records.动态可解释机器学习预测 ICU 患者死亡率：电子患者记录中高频数据的回顾性研究。

Lancet Digit Health. 2020 Apr;2(4):e179-e191. doi: 10.1016/S2589-7500(20)30018-2. Epub 2020 Mar 12.

[Risk factors for death in elderly patients admitted to intensive care unit after elective abdominal surgery: a consecutive 5-year retrospective study].择期腹部手术后入住重症监护病房老年患者的死亡危险因素：一项连续5年的回顾性研究

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2021 Dec;33(12):1453-1458. doi: 10.3760/cma.j.cn121430-20210804-00118.

Testing a digital system that ranks the risk of unplanned intensive care unit admission in all ward patients: protocol for a prospective observational cohort study.测试一种数字系统，对所有病房患者的非计划性重症监护病房入院风险进行分级：一项前瞻性观察性队列研究方案。

BMJ Open. 2019 Sep 11;9(9):e032429. doi: 10.1136/bmjopen-2019-032429.

Survival prediction in intensive-care units based on aggregation of long-term disease history and acute physiology: a retrospective study of the Danish National Patient Registry and electronic patient records.基于长期疾病史和急性生理学数据聚合的重症监护病房患者生存预测：丹麦国家患者登记处和电子患者记录的回顾性研究。

Lancet Digit Health. 2019 Jun;1(2):e78-e89. doi: 10.1016/S2589-7500(19)30024-X. Epub 2019 May 23.

Characteristics and outcomes of unplanned intensive care unit admission after general anesthesia.全麻后非计划性转入重症监护病房的特征和结局。

BMC Anesthesiol. 2022 Jun 20;22(1):191. doi: 10.1186/s12871-022-01729-y.

引用本文的文献

Applications of generative artificial intelligence in outcome prediction in intensive care medicine-a scoping review.生成式人工智能在重症医学结局预测中的应用——一项范围综述

Front Digit Health. 2025 Aug 5;7:1633458. doi: 10.3389/fdgth.2025.1633458. eCollection 2025.

Clinical and economic impact of a large language model in perioperative medicine: a randomized crossover trial.大语言模型在围手术期医学中的临床和经济影响：一项随机交叉试验

NPJ Digit Med. 2025 Jul 21;8(1):462. doi: 10.1038/s41746-025-01858-x.

The Role of Large Language Models (LLMs) in Hepato-Pancreato-Biliary Surgery: Opportunities and Challenges.大语言模型在肝胰胆外科手术中的作用：机遇与挑战

Cureus. 2025 Jun 14;17(6):e85979. doi: 10.7759/cureus.85979. eCollection 2025 Jun.

Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.使用大语言模型从放射学报告中提取和整理胰腺囊肿监测数据

J Am Coll Surg. 2025 Jul 10. doi: 10.1097/XCS.0000000000001478.

Applying Large Language Models for Surgical Case Length Prediction.将大语言模型应用于手术病例时长预测。

JAMA Surg. 2025 Jul 9. doi: 10.1001/jamasurg.2025.2154.

Large language models for disease diagnosis: a scoping review.用于疾病诊断的大语言模型：一项范围综述。

NPJ Artif Intell. 2025;1(1):9. doi: 10.1038/s44387-025-00011-z. Epub 2025 Jun 9.

Artificial intelligence for the prediction of postoperative complications in the critically ill.用于预测危重症患者术后并发症的人工智能

Crit Care Sci. 2025 Jun 20;37:e20250025. doi: 10.62675/2965-2774.20250025. eCollection 2025.

Harnessing Generative Artificial Intelligence in Pediatric Anesthesia: Enhancing Learning, Patient Care, and Family Communication.在儿科麻醉中利用生成式人工智能：加强学习、患者护理和医患沟通。

Paediatr Anaesth. 2025 Sep;35(9):691-694. doi: 10.1111/pan.70005. Epub 2025 Jun 24.

Construction and validation of a nomogram prediction model for the need for intensive care unit admission after hip fracture surgery.髋部骨折手术后重症监护病房入院需求的列线图预测模型的构建与验证

Medicine (Baltimore). 2025 Jun 13;104(24):e42793. doi: 10.1097/MD.0000000000042793.

The applications of ChatGPT and other large language models in anesthesiology and critical care: a systematic review.ChatGPT及其他大语言模型在麻醉学与重症监护中的应用：一项系统综述

Can J Anaesth. 2025 Jun 16. doi: 10.1007/s12630-025-02973-9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验