文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

大语言模型在围手术期风险预测和预后中的应用。

Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication.

机构信息

Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University, Stanford, California.

Department of Anesthesiology & Pain Medicine, University of Washington, Seattle.

出版信息

JAMA Surg. 2024 Aug 1;159(8):928-937. doi: 10.1001/jamasurg.2024.1621.


DOI:10.1001/jamasurg.2024.1621
PMID:38837145
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11154375/
Abstract

IMPORTANCE: General-domain large language models may be able to perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient's electronic health record notes. OBJECTIVE: To examine predictive performance on 8 different tasks: prediction of American Society of Anesthesiologists Physical Status (ASA-PS), hospital admission, intensive care unit (ICU) admission, unplanned admission, hospital mortality, postanesthesia care unit (PACU) phase 1 duration, hospital duration, and ICU duration. DESIGN, SETTING, AND PARTICIPANTS: This prognostic study included task-specific datasets constructed from 2 years of retrospective electronic health records data collected during routine clinical care. Case and note data were formatted into prompts and given to the large language model GPT-4 Turbo (OpenAI) to generate a prediction and explanation. The setting included a quaternary care center comprising 3 academic hospitals and affiliated clinics in a single metropolitan area. Patients who had a surgery or procedure with anesthesia and at least 1 clinician-written note filed in the electronic health record before surgery were included in the study. Data were analyzed from November to December 2023. EXPOSURES: Compared original notes, note summaries, few-shot prompting, and chain-of-thought prompting strategies. MAIN OUTCOMES AND MEASURES: F1 score for binary and categorical outcomes. Mean absolute error for numerical duration outcomes. RESULTS: Study results were measured on task-specific datasets, each with 1000 cases with the exception of unplanned admission, which had 949 cases, and hospital mortality, which had 576 cases. The best results for each task included an F1 score of 0.50 (95% CI, 0.47-0.53) for ASA-PS, 0.64 (95% CI, 0.61-0.67) for hospital admission, 0.81 (95% CI, 0.78-0.83) for ICU admission, 0.61 (95% CI, 0.58-0.64) for unplanned admission, and 0.86 (95% CI, 0.83-0.89) for hospital mortality prediction. Performance on duration prediction tasks was universally poor across all prompt strategies for which the large language model achieved a mean absolute error of 49 minutes (95% CI, 46-51 minutes) for PACU phase 1 duration, 4.5 days (95% CI, 4.2-5.0 days) for hospital duration, and 1.1 days (95% CI, 0.9-1.3 days) for ICU duration prediction. CONCLUSIONS AND RELEVANCE: Current general-domain large language models may assist clinicians in perioperative risk stratification on classification tasks but are inadequate for numerical duration predictions. Their ability to produce high-quality natural language explanations for the predictions may make them useful tools in clinical workflows and may be complementary to traditional risk prediction models.

摘要

重要性:通用领域的大型语言模型可能能够通过描述手术过程和患者的电子健康记录笔记,对风险分层和预测术后结果进行评估。

目的:检验在 8 项不同任务上的预测性能:预测美国麻醉医师协会身体状况评分(ASA-PS)、住院、重症监护病房(ICU)入院、非计划性入院、住院死亡率、麻醉后护理单元(PACU)第 1 阶段持续时间、住院时间和 ICU 持续时间。

设计、设置和参与者:这项预后研究包括从 2 年的回顾性电子健康记录数据中构建的特定于任务的数据集,这些数据是在常规临床护理期间收集的。病例和笔记数据被格式化为提示,并提供给大型语言模型 GPT-4 Turbo(OpenAI)以生成预测和解释。该设置包括一个由 3 家学术医院和一个大都市地区的附属诊所组成的四级护理中心。研究包括接受了手术或有麻醉的患者,并且在手术前至少有 1 位临床医生书写的记录在电子健康记录中的患者。数据分析于 2023 年 11 月至 12 月进行。

暴露情况:与原始笔记、笔记摘要、少量提示和链式思维提示策略进行比较。

主要结果和措施:二元和分类结果的 F1 评分。数值持续时间结果的平均绝对误差。

结果:研究结果是在特定于任务的数据集上进行测量的,每个数据集有 1000 个病例,除了非计划性入院有 949 个病例,以及住院死亡率有 576 个病例。每个任务的最佳结果包括 ASA-PS 的 F1 评分为 0.50(95%CI,0.47-0.53),住院的 F1 评分为 0.64(95%CI,0.61-0.67),ICU 入院的 F1 评分为 0.81(95%CI,0.78-0.83),非计划性入院的 F1 评分为 0.61(95%CI,0.58-0.64),以及住院死亡率预测的 F1 评分为 0.86(95%CI,0.83-0.89)。对于所有提示策略,语言模型在持续时间预测任务上的性能普遍较差,对于 PACU 第 1 阶段持续时间预测,语言模型的平均绝对误差为 49 分钟(95%CI,46-51 分钟),对于住院持续时间预测为 4.5 天(95%CI,4.2-5.0 天),对于 ICU 持续时间预测为 1.1 天(95%CI,0.9-1.3 天)。

结论和相关性:当前的通用领域大型语言模型可能有助于临床医生进行围手术期风险分层的分类任务,但对于数值持续时间预测则不足。它们能够为预测生成高质量的自然语言解释,这可能使它们成为临床工作流程中的有用工具,并且可能与传统的风险预测模型互补。

相似文献

[1]
Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication.

JAMA Surg. 2024-8-1

[2]
Prediction of American Society of Anesthesiologists Physical Status Classification from preoperative clinical text narratives using natural language processing.

BMC Anesthesiol. 2023-9-4

[3]
Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.

J Med Internet Res. 2025-6-3

[4]
Development and Validation of a Dynamic Real-Time Risk Prediction Model for Intensive Care Units Patients Based on Longitudinal Irregular Data: Multicenter Retrospective Study.

J Med Internet Res. 2025-4-23

[5]
Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data.

JAMA Netw Open. 2018-12-7

[6]
Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records.

Lancet Digit Health. 2020-4

[7]
[Risk factors for death in elderly patients admitted to intensive care unit after elective abdominal surgery: a consecutive 5-year retrospective study].

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2021-12

[8]
Testing a digital system that ranks the risk of unplanned intensive care unit admission in all ward patients: protocol for a prospective observational cohort study.

BMJ Open. 2019-9-11

[9]
Survival prediction in intensive-care units based on aggregation of long-term disease history and acute physiology: a retrospective study of the Danish National Patient Registry and electronic patient records.

Lancet Digit Health. 2019-6

[10]
Characteristics and outcomes of unplanned intensive care unit admission after general anesthesia.

BMC Anesthesiol. 2022-6-20

引用本文的文献

[1]
Applications of generative artificial intelligence in outcome prediction in intensive care medicine-a scoping review.

Front Digit Health. 2025-8-5

[2]
Clinical and economic impact of a large language model in perioperative medicine: a randomized crossover trial.

NPJ Digit Med. 2025-7-21

[3]
The Role of Large Language Models (LLMs) in Hepato-Pancreato-Biliary Surgery: Opportunities and Challenges.

Cureus. 2025-6-14

[4]
Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models.

J Am Coll Surg. 2025-7-10

[5]
Applying Large Language Models for Surgical Case Length Prediction.

JAMA Surg. 2025-7-9

[6]
Large language models for disease diagnosis: a scoping review.

NPJ Artif Intell. 2025

[7]
Artificial intelligence for the prediction of postoperative complications in the critically ill.

Crit Care Sci. 2025-6-20

[8]
Harnessing Generative Artificial Intelligence in Pediatric Anesthesia: Enhancing Learning, Patient Care, and Family Communication.

Paediatr Anaesth. 2025-9

[9]
Construction and validation of a nomogram prediction model for the need for intensive care unit admission after hip fracture surgery.

Medicine (Baltimore). 2025-6-13

[10]
The applications of ChatGPT and other large language models in anesthesiology and critical care: a systematic review.

Can J Anaesth. 2025-6-16

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索