• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Applying Large Language Models for Surgical Case Length Prediction.将大语言模型应用于手术病例时长预测。
JAMA Surg. 2025 Jul 9. doi: 10.1001/jamasurg.2025.2154.
2
A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients.近期大型语言模型在生成肺癌患者出院小结方面的比较研究。
J Biomed Inform. 2025 Aug;168:104867. doi: 10.1016/j.jbi.2025.104867. Epub 2025 Jun 20.
3
Large Language Model Symptom Identification From Clinical Text: Multicenter Study.基于临床文本的大语言模型症状识别:多中心研究。
J Med Internet Res. 2025 Jul 31;27:e72984. doi: 10.2196/72984.
4
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
5
Fine-tuning open-source large language models to improve their performance on radiation oncology tasks: A feasibility study to investigate their potential clinical applications in radiation oncology.微调开源大语言模型以提高其在放射肿瘤学任务中的性能:一项调查其在放射肿瘤学中潜在临床应用的可行性研究。
Med Phys. 2025 Jul;52(7):e17985. doi: 10.1002/mp.17985.
6
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
7
Classifying Patient Complaints Using Artificial Intelligence-Powered Large Language Models: Cross-Sectional Study.使用人工智能驱动的大语言模型对患者投诉进行分类:横断面研究
J Med Internet Res. 2025 Aug 6;27:e74231. doi: 10.2196/74231.
8
Performance and Reproducibility of Large Language Models in Named Entity Recognition: Considerations for the Use in Controlled Environments.大型语言模型在命名实体识别中的性能与可重复性:在受控环境中使用的考量
Drug Saf. 2025 Mar;48(3):287-303. doi: 10.1007/s40264-024-01499-1. Epub 2024 Dec 11.
9
Machine Learning Feasibility in Cochlear Implant Speech Perception Outcomes-Moving Beyond Single Biomarkers for Cochlear Implant Performance Prediction.机器学习在人工耳蜗语音感知结果中的可行性——超越单一生物标志物进行人工耳蜗性能预测
Ear Hear. 2025;46(5):1266-1281. doi: 10.1097/AUD.0000000000001664. Epub 2025 Apr 4.
10
Performance analysis of large language models in multi-disease detection from chest computed tomography reports: a comparative study: Experimental Research.基于胸部计算机断层扫描报告的多疾病检测中大型语言模型的性能分析:一项比较研究:实验研究
Int J Surg. 2025 Jun 5. doi: 10.1097/JS9.0000000000002582.

本文引用的文献

1
CPLLM: Clinical prediction with large language models.CPLLM:基于大语言模型的临床预测
PLOS Digit Health. 2024 Dec 6;3(12):e0000680. doi: 10.1371/journal.pdig.0000680. eCollection 2024 Dec.
2
Efficiency at scale: Investigating the performance of diminutive language models in clinical tasks.规模化效率:研究微型语言模型在临床任务中的性能。
Artif Intell Med. 2024 Nov;157:103002. doi: 10.1016/j.artmed.2024.103002. Epub 2024 Oct 23.
3
Fine-Tuning Large Language Models to Enhance Programmatic Assessment in Graduate Medical Education.微调大语言模型以加强毕业后医学教育中的程序化评估。
J Educ Perioper Med. 2024 Sep 30;26(3):E729. doi: 10.46374/VolXXVI_Issue3_Moore. eCollection 2024 Jul-Sep.
4
Optimal Large Language Model Characteristics to Balance Accuracy and Energy Use for Sustainable Medical Applications.最优大语言模型特性,兼顾准确性和能源使用,以实现可持续医疗应用。
Radiology. 2024 Aug;312(2):e240320. doi: 10.1148/radiol.240320.
5
Evaluating large language models for health-related text classification tasks with public social media data.利用公共社交媒体数据评估用于健康相关文本分类任务的大型语言模型。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2181-2189. doi: 10.1093/jamia/ocae210.
6
A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports.基于大语言模型的零样本推理与乳腺癌病理报告任务特定监督分类的比较研究。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2315-2327. doi: 10.1093/jamia/ocae146.
7
Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication.大语言模型在围手术期风险预测和预后中的应用。
JAMA Surg. 2024 Aug 1;159(8):928-937. doi: 10.1001/jamasurg.2024.1621.
8
Fine-tuning large language models for rare disease concept normalization.微调大型语言模型以实现罕见病概念规范化。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2076-2083. doi: 10.1093/jamia/ocae133.
9
Evaluating the ChatGPT family of models for biomedical reasoning and classification.评估ChatGPT系列模型在生物医学推理和分类方面的表现。
J Am Med Inform Assoc. 2024 Apr 3;31(4):940-948. doi: 10.1093/jamia/ocad256.
10
Improving preoperative prediction of surgery duration.提高手术时间的术前预测。
BMC Health Serv Res. 2023 Dec 2;23(1):1343. doi: 10.1186/s12913-023-10264-6.

将大语言模型应用于手术病例时长预测。

Applying Large Language Models for Surgical Case Length Prediction.

作者信息

Ramamurthi Adhitya, Neupane Bhabishya, Deshpande Priya, Hanson Ryan, Vegesna Srujan, Cray Deborah, Crotty Bradley H, Somai Melek, Brown Kellie R, Pawar Sachin S, Taylor Bradley, Kothari Anai N

机构信息

Selig Hub for Surgical Data Science, Medical College of Wisconsin, Milwaukee.

Department of Surgery, Medical College of Wisconsin, Milwaukee.

出版信息

JAMA Surg. 2025 Jul 9. doi: 10.1001/jamasurg.2025.2154.

DOI:10.1001/jamasurg.2025.2154
PMID:40632526
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12242817/
Abstract

IMPORTANCE

Accurate prediction of surgical case duration is critical for operating room (OR) management, as inefficient scheduling can lead to reduced patient and surgeon satisfaction while incurring considerable financial costs.

OBJECTIVE

To evaluate the feasibility and accuracy of large language models (LLMs) in predicting surgical case length using unstructured clinical data compared to existing estimation methods.

DESIGN, SETTING, AND PARTICIPANTS: This was a retrospective study analyzing elective surgical cases performed between January 2017 and December 2023 at a single academic medical center and affiliated community hospital ORs. Analysis included 125 493 eligible surgical cases, with 1950 used for LLM fine-tuning and 2500 for evaluation. An additional 500 cases from a community site were used for external validation. Cases were randomly sampled using strata to ensure representation across surgical specialties.

EXPOSURES

Eleven LLMs, including base models (GPT-4, GPT-3.5, Mistral, Llama-3, Phi-3) and 2 fine-tuned variants (GPT-4 fine-tuned, GPT-3.5 fine-tuned), were used to predict surgical case length based on clinical notes.

MAIN OUTCOMES AND MEASURES

The primary outcome was average error between predicted and actual surgical case length (wheels-in to wheels-out time). The secondary outcome was prediction accuracy, defined as predicted length within 20% of actual duration.

RESULTS

Fine-tuned GPT-4 achieved the best performance with a mean absolute error (MAE) of 47.64 minutes (95% CI, 45.71-49.56) and R2 of 0.61, matching the performance of current OR scheduling (MAE, 49.34 minutes; 95% CI, 47.60-51.09; R2, 0.63; P = .10). Both GPT-4 fine-tuned and GPT-3.5 fine-tuned significantly outperformed current scheduling methods in accuracy (46.12% and 46.08% vs 40.92%, respectively; P < .001). GPT-4 fine-tuned outperformed all other models during external validation with similar performance metrics (MAE, 48.66 minutes; 95% CI, 45.31-52.00; accuracy, 46.0%). Base models demonstrated variable performance, with GPT-4 showing the highest performance among non-fine-tuned models (MAE, 59.20 minutes; 95% CI, 56.88 - 61.52).

CONCLUSION AND RELEVANCE

The findings in this study suggest that fine-tuned LLMs can predict surgical case length with accuracy comparable to or exceeding current institutional scheduling methods. This indicates potential for LLMs to enhance operating room efficiency through improved case length prediction using existing clinical documentation.

摘要

重要性

准确预测手术时长对于手术室管理至关重要,因为安排不当会降低患者和外科医生的满意度,同时产生可观的财务成本。

目的

与现有估计方法相比,评估大语言模型(LLMs)使用非结构化临床数据预测手术时长的可行性和准确性。

设计、设置和参与者:这是一项回顾性研究,分析了2017年1月至2023年12月在一家学术医疗中心及其附属社区医院手术室进行的择期手术病例。分析包括125493例符合条件的手术病例,其中1950例用于LLM微调,2500例用于评估。另外从一个社区站点选取500例病例用于外部验证。病例使用分层随机抽样,以确保涵盖各个外科专业。

暴露因素

使用11个LLMs,包括基础模型(GPT-4、GPT-3.5、Mistral、Llama-3、Phi-3)和2个微调变体(GPT-4微调、GPT-3.5微调),根据临床记录预测手术时长。

主要结局和指标

主要结局是预测手术时长与实际手术时长(从进手术室到出手术室的时间)之间的平均误差。次要结局是预测准确性,定义为预测时长在实际时长的20%以内。

结果

微调后的GPT-4表现最佳,平均绝对误差(MAE)为47.64分钟(95%CI,45.71 - 49.56),R2为0.61,与当前手术室排班的表现相当(MAE,49.34分钟;95%CI,47.60 - 51.09;R2,0.63;P = 0.10)。GPT-4微调和GPT-3.5微调在准确性方面均显著优于当前排班方法(分别为46.12%和46.08%对40.92%;P < 0.001)。在外部验证中,微调后的GPT-4在所有其他模型中表现最佳,性能指标相似(MAE,48.66分钟;95%CI,45.31 - 52.00;准确性,46.0%)。基础模型表现各异,GPT-4在未微调模型中表现最佳(MAE,59.20分钟;95%CI,56.88 - 61.52)。

结论及意义

本研究结果表明,微调后的LLMs能够以与当前机构排班方法相当或更高的准确性预测手术时长。这表明LLMs有潜力通过利用现有临床文档改进手术时长预测来提高手术室效率。