Daccache Nicolas, Zako Joe, Morisson Louis, Laferrière-Langlois Pascal
Maisonneuve-Rosemont Hospital Research Centre, Université de Montréal, Montreal, QC, Canada.
Can J Anaesth. 2025 Jun 16. doi: 10.1007/s12630-025-02973-9.
PURPOSE: ChatGPT and other large language models (LLMs) have gained immense popularity since their commercial release in 2022, with applications in various sectors including health care. We sought to evaluate their deployment in anesthesiology and critical care in a systematic review. Our aim was to describe the integration of LLMs in the field by showcasing and categorizing their current applications, assessing their performance in patient care, and reviewing application-specific ethical and practical challenges in deployment. METHODS: Respecting Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, we systematically searched through PubMed®, Embase, the Cochrane Central Register of Controlled Trials, and Web of Science®, from inception until 1 August 2024. We extracted all papers investigating LLMs in anesthesiology or critical care and reporting results. We segmented the literature into major themes and highlighted key findings and limitations. RESULTS: From 480 retrieved articles, we included 45 papers. The evaluated models (GPT-4, GPT-3.5, Google Bard [now Gemini], LLaMA, and others) showed diverse applications in four segments: intensive care unit, patient education, medical education, and perioperative care. Large language models, especially newer models, are promising in predicting clinical scores, navigating simple clinical scenarios, and managing preoperative anxiety. Their performance remains below the clinician level in predicting outcomes, solving complex clinical scenarios (i.e., airway management), board examinations, and generating patient-directed documents, although newer models performed better than older ones. CONCLUSION: While LLMs are not yet equipped to fully assist physicians in anesthesiology and critical care, they have significant potential, and their capabilities are rapidly improving. Supervised use for select tasks can streamline patient care. Further trials are warranted as new versions of models become available. STUDY REGISTRATION: PROSPERO ( CRD42024567380 ); first submitted 22 July 2024.
目的:ChatGPT和其他大语言模型(LLMs)自2022年商业发布以来广受欢迎,在包括医疗保健在内的各个领域都有应用。我们试图通过系统评价来评估它们在麻醉学和重症监护中的应用情况。我们的目的是通过展示和分类它们当前的应用、评估它们在患者护理中的表现以及审查应用特定的伦理和实际部署挑战,来描述大语言模型在该领域的整合情况。 方法:遵循系统评价和Meta分析的首选报告项目(PRISMA)指南,我们从数据库建立到2024年8月1日,在PubMed®、Embase、Cochrane对照试验中央注册库和Web of Science®中进行了系统检索。我们提取了所有研究大语言模型在麻醉学或重症监护中的应用并报告结果的论文。我们将文献分为主要主题,并突出关键发现和局限性。 结果:从检索到的480篇文章中,我们纳入了45篇论文。所评估的模型(GPT - 4、GPT - 3.5、谷歌巴德[现为Gemini]、LLaMA等)在四个领域展示了不同的应用:重症监护病房、患者教育、医学教育和围手术期护理。大语言模型,尤其是较新的模型,在预测临床评分、处理简单临床场景和管理术前焦虑方面很有前景。尽管较新的模型表现优于旧模型,但它们在预测结果、解决复杂临床场景(如气道管理)、执业考试以及生成面向患者文档方面的表现仍低于临床医生水平。 结论:虽然大语言模型尚未完全具备协助麻醉学和重症监护医生的能力,但它们具有巨大潜力,且其能力正在迅速提高。对选定任务进行监督使用可以简化患者护理。随着模型新版本的推出,有必要进行进一步试验。 研究注册:PROSPERO(CRD42024567380);首次提交于2024年7月22日。
J Med Internet Res. 2025-6-9
J Med Internet Res. 2025-1-23
Cochrane Database Syst Rev. 2016-10-4
Front Psychiatry. 2024-6-24