Suppr超能文献

将大语言模型应用于手术病例时长预测。

Applying Large Language Models for Surgical Case Length Prediction.

作者信息

Ramamurthi Adhitya, Neupane Bhabishya, Deshpande Priya, Hanson Ryan, Vegesna Srujan, Cray Deborah, Crotty Bradley H, Somai Melek, Brown Kellie R, Pawar Sachin S, Taylor Bradley, Kothari Anai N

机构信息

Selig Hub for Surgical Data Science, Medical College of Wisconsin, Milwaukee.

Department of Surgery, Medical College of Wisconsin, Milwaukee.

出版信息

JAMA Surg. 2025 Jul 9. doi: 10.1001/jamasurg.2025.2154.

Abstract

IMPORTANCE

Accurate prediction of surgical case duration is critical for operating room (OR) management, as inefficient scheduling can lead to reduced patient and surgeon satisfaction while incurring considerable financial costs.

OBJECTIVE

To evaluate the feasibility and accuracy of large language models (LLMs) in predicting surgical case length using unstructured clinical data compared to existing estimation methods.

DESIGN, SETTING, AND PARTICIPANTS: This was a retrospective study analyzing elective surgical cases performed between January 2017 and December 2023 at a single academic medical center and affiliated community hospital ORs. Analysis included 125 493 eligible surgical cases, with 1950 used for LLM fine-tuning and 2500 for evaluation. An additional 500 cases from a community site were used for external validation. Cases were randomly sampled using strata to ensure representation across surgical specialties.

EXPOSURES

Eleven LLMs, including base models (GPT-4, GPT-3.5, Mistral, Llama-3, Phi-3) and 2 fine-tuned variants (GPT-4 fine-tuned, GPT-3.5 fine-tuned), were used to predict surgical case length based on clinical notes.

MAIN OUTCOMES AND MEASURES

The primary outcome was average error between predicted and actual surgical case length (wheels-in to wheels-out time). The secondary outcome was prediction accuracy, defined as predicted length within 20% of actual duration.

RESULTS

Fine-tuned GPT-4 achieved the best performance with a mean absolute error (MAE) of 47.64 minutes (95% CI, 45.71-49.56) and R2 of 0.61, matching the performance of current OR scheduling (MAE, 49.34 minutes; 95% CI, 47.60-51.09; R2, 0.63; P = .10). Both GPT-4 fine-tuned and GPT-3.5 fine-tuned significantly outperformed current scheduling methods in accuracy (46.12% and 46.08% vs 40.92%, respectively; P < .001). GPT-4 fine-tuned outperformed all other models during external validation with similar performance metrics (MAE, 48.66 minutes; 95% CI, 45.31-52.00; accuracy, 46.0%). Base models demonstrated variable performance, with GPT-4 showing the highest performance among non-fine-tuned models (MAE, 59.20 minutes; 95% CI, 56.88 - 61.52).

CONCLUSION AND RELEVANCE

The findings in this study suggest that fine-tuned LLMs can predict surgical case length with accuracy comparable to or exceeding current institutional scheduling methods. This indicates potential for LLMs to enhance operating room efficiency through improved case length prediction using existing clinical documentation.

摘要

重要性

准确预测手术时长对于手术室管理至关重要,因为安排不当会降低患者和外科医生的满意度,同时产生可观的财务成本。

目的

与现有估计方法相比,评估大语言模型(LLMs)使用非结构化临床数据预测手术时长的可行性和准确性。

设计、设置和参与者:这是一项回顾性研究,分析了2017年1月至2023年12月在一家学术医疗中心及其附属社区医院手术室进行的择期手术病例。分析包括125493例符合条件的手术病例,其中1950例用于LLM微调,2500例用于评估。另外从一个社区站点选取500例病例用于外部验证。病例使用分层随机抽样,以确保涵盖各个外科专业。

暴露因素

使用11个LLMs,包括基础模型(GPT-4、GPT-3.5、Mistral、Llama-3、Phi-3)和2个微调变体(GPT-4微调、GPT-3.5微调),根据临床记录预测手术时长。

主要结局和指标

主要结局是预测手术时长与实际手术时长(从进手术室到出手术室的时间)之间的平均误差。次要结局是预测准确性,定义为预测时长在实际时长的20%以内。

结果

微调后的GPT-4表现最佳,平均绝对误差(MAE)为47.64分钟(95%CI,45.71 - 49.56),R2为0.61,与当前手术室排班的表现相当(MAE,49.34分钟;95%CI,47.60 - 51.09;R2,0.63;P = 0.10)。GPT-4微调和GPT-3.5微调在准确性方面均显著优于当前排班方法(分别为46.12%和46.08%对40.92%;P < 0.001)。在外部验证中,微调后的GPT-4在所有其他模型中表现最佳,性能指标相似(MAE,48.66分钟;95%CI,45.31 - 52.00;准确性,46.0%)。基础模型表现各异,GPT-4在未微调模型中表现最佳(MAE,59.20分钟;95%CI,56.88 - 61.52)。

结论及意义

本研究结果表明,微调后的LLMs能够以与当前机构排班方法相当或更高的准确性预测手术时长。这表明LLMs有潜力通过利用现有临床文档改进手术时长预测来提高手术室效率。

相似文献

本文引用的文献

1
CPLLM: Clinical prediction with large language models.CPLLM:基于大语言模型的临床预测
PLOS Digit Health. 2024 Dec 6;3(12):e0000680. doi: 10.1371/journal.pdig.0000680. eCollection 2024 Dec.
10
Improving preoperative prediction of surgery duration.提高手术时间的术前预测。
BMC Health Serv Res. 2023 Dec 2;23(1):1343. doi: 10.1186/s12913-023-10264-6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验