Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University, Stanford, California.
Department of Anesthesiology & Pain Medicine, University of Washington, Seattle.
JAMA Surg. 2024 Aug 1;159(8):928-937. doi: 10.1001/jamasurg.2024.1621.
IMPORTANCE: General-domain large language models may be able to perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient's electronic health record notes. OBJECTIVE: To examine predictive performance on 8 different tasks: prediction of American Society of Anesthesiologists Physical Status (ASA-PS), hospital admission, intensive care unit (ICU) admission, unplanned admission, hospital mortality, postanesthesia care unit (PACU) phase 1 duration, hospital duration, and ICU duration. DESIGN, SETTING, AND PARTICIPANTS: This prognostic study included task-specific datasets constructed from 2 years of retrospective electronic health records data collected during routine clinical care. Case and note data were formatted into prompts and given to the large language model GPT-4 Turbo (OpenAI) to generate a prediction and explanation. The setting included a quaternary care center comprising 3 academic hospitals and affiliated clinics in a single metropolitan area. Patients who had a surgery or procedure with anesthesia and at least 1 clinician-written note filed in the electronic health record before surgery were included in the study. Data were analyzed from November to December 2023. EXPOSURES: Compared original notes, note summaries, few-shot prompting, and chain-of-thought prompting strategies. MAIN OUTCOMES AND MEASURES: F1 score for binary and categorical outcomes. Mean absolute error for numerical duration outcomes. RESULTS: Study results were measured on task-specific datasets, each with 1000 cases with the exception of unplanned admission, which had 949 cases, and hospital mortality, which had 576 cases. The best results for each task included an F1 score of 0.50 (95% CI, 0.47-0.53) for ASA-PS, 0.64 (95% CI, 0.61-0.67) for hospital admission, 0.81 (95% CI, 0.78-0.83) for ICU admission, 0.61 (95% CI, 0.58-0.64) for unplanned admission, and 0.86 (95% CI, 0.83-0.89) for hospital mortality prediction. Performance on duration prediction tasks was universally poor across all prompt strategies for which the large language model achieved a mean absolute error of 49 minutes (95% CI, 46-51 minutes) for PACU phase 1 duration, 4.5 days (95% CI, 4.2-5.0 days) for hospital duration, and 1.1 days (95% CI, 0.9-1.3 days) for ICU duration prediction. CONCLUSIONS AND RELEVANCE: Current general-domain large language models may assist clinicians in perioperative risk stratification on classification tasks but are inadequate for numerical duration predictions. Their ability to produce high-quality natural language explanations for the predictions may make them useful tools in clinical workflows and may be complementary to traditional risk prediction models.
重要性:通用领域的大型语言模型可能能够通过描述手术过程和患者的电子健康记录笔记,对风险分层和预测术后结果进行评估。
目的:检验在 8 项不同任务上的预测性能:预测美国麻醉医师协会身体状况评分(ASA-PS)、住院、重症监护病房(ICU)入院、非计划性入院、住院死亡率、麻醉后护理单元(PACU)第 1 阶段持续时间、住院时间和 ICU 持续时间。
设计、设置和参与者:这项预后研究包括从 2 年的回顾性电子健康记录数据中构建的特定于任务的数据集,这些数据是在常规临床护理期间收集的。病例和笔记数据被格式化为提示,并提供给大型语言模型 GPT-4 Turbo(OpenAI)以生成预测和解释。该设置包括一个由 3 家学术医院和一个大都市地区的附属诊所组成的四级护理中心。研究包括接受了手术或有麻醉的患者,并且在手术前至少有 1 位临床医生书写的记录在电子健康记录中的患者。数据分析于 2023 年 11 月至 12 月进行。
暴露情况:与原始笔记、笔记摘要、少量提示和链式思维提示策略进行比较。
主要结果和措施:二元和分类结果的 F1 评分。数值持续时间结果的平均绝对误差。
结果:研究结果是在特定于任务的数据集上进行测量的,每个数据集有 1000 个病例,除了非计划性入院有 949 个病例,以及住院死亡率有 576 个病例。每个任务的最佳结果包括 ASA-PS 的 F1 评分为 0.50(95%CI,0.47-0.53),住院的 F1 评分为 0.64(95%CI,0.61-0.67),ICU 入院的 F1 评分为 0.81(95%CI,0.78-0.83),非计划性入院的 F1 评分为 0.61(95%CI,0.58-0.64),以及住院死亡率预测的 F1 评分为 0.86(95%CI,0.83-0.89)。对于所有提示策略,语言模型在持续时间预测任务上的性能普遍较差,对于 PACU 第 1 阶段持续时间预测,语言模型的平均绝对误差为 49 分钟(95%CI,46-51 分钟),对于住院持续时间预测为 4.5 天(95%CI,4.2-5.0 天),对于 ICU 持续时间预测为 1.1 天(95%CI,0.9-1.3 天)。
结论和相关性:当前的通用领域大型语言模型可能有助于临床医生进行围手术期风险分层的分类任务,但对于数值持续时间预测则不足。它们能够为预测生成高质量的自然语言解释,这可能使它们成为临床工作流程中的有用工具,并且可能与传统的风险预测模型互补。
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2021-12
BMC Anesthesiol. 2022-6-20
JAMA Surg. 2025-7-9
NPJ Artif Intell. 2025
Crit Care Sci. 2025-6-20