Suppr超能文献

评估大型语言模型对手术指南的遵循情况:聊天机器人推荐与北美脊柱学会(NASS)覆盖标准的对比分析

Evaluating the Adherence of Large Language Models to Surgical Guidelines: A Comparative Analysis of Chatbot Recommendations and North American Spine Society (NASS) Coverage Criteria.

作者信息

Sarikonda Advith, Isch Emily, Self Mitchell, Sambangi Abhijeet, Carreras Angeleah, Sivaganesan Ahilan, Harrop Jim, Jallo Jack

机构信息

Department of Neurological Surgery, Thomas Jefferson University, Philadelphia, USA.

Department of General Surgery, Division of Plastic Surgery, Thomas Jefferson University Hospital, Philadelphia, USA.

出版信息

Cureus. 2024 Sep 3;16(9):e68521. doi: 10.7759/cureus.68521. eCollection 2024 Sep.

Abstract

Background There has been a significant increase in cervical fusion procedures, both anterior and posterior, across the United States. Despite this upward trend, limited research exists on adherence to evidence-based medicine (EBM) guidelines for cervical fusion, highlighting a gap between recommended practices and surgeon preferences. Additionally, patients are increasingly utilizing large language models (LLMs) to aid in decision-making. Methodology This observational study evaluated the capacity of four LLMs, namely, Bard, BingAI, ChatGPT-3.5, and ChatGPT-4, to adhere to EBM guidelines, specifically the 2023 North American Spine Society (NASS) cervical fusion guidelines. Ten clinical vignettes were created based on NASS recommendations to determine when fusion was indicated. This novel approach assessed LLM performance in a clinical decision-making context without requiring institutional review board approval, as no human subjects were involved. Results No LLM achieved complete concordance with NASS guidelines, though ChatGPT-4 and Bing Chat exhibited the highest adherence at 60%. Discrepancies were notably observed in scenarios involving head-drop syndrome and pseudoarthrosis, where all LLMs failed to align with NASS recommendations. Additionally, only 25% of LLMs agreed with NASS guidelines for fusion in cases of cervical radiculopathy and as an adjunct to facet cyst resection. Conclusions The study underscores the need for improved LLM training on clinical guidelines and emphasizes the importance of considering the nuances of individual patient cases. While LLMs hold promise for enhancing guideline adherence in cervical fusion decision-making, their current performance indicates a need for further refinement and integration with clinical expertise to ensure optimal patient care. This study contributes to understanding the role of AI in healthcare, advocating for a balanced approach that leverages technological advancements while acknowledging the complexities of surgical decision-making.

摘要

背景

在美国,颈椎融合手术(包括前路和后路)的数量显著增加。尽管有这种上升趋势,但关于颈椎融合遵循循证医学(EBM)指南的研究有限,这凸显了推荐做法与外科医生偏好之间的差距。此外,患者越来越多地利用大语言模型(LLM)来辅助决策。

方法

这项观察性研究评估了四种大语言模型,即Bard、BingAI、ChatGPT - 3.5和ChatGPT - 4,遵循EBM指南的能力,特别是2023年北美脊柱协会(NASS)颈椎融合指南。根据NASS建议创建了10个临床案例,以确定何时需要进行融合。这种新颖的方法在临床决策背景下评估了大语言模型的性能,由于不涉及人类受试者,无需机构审查委员会批准。

结果

没有一个大语言模型与NASS指南完全一致,不过ChatGPT - 4和必应聊天表现出最高的遵循率,为60%。在涉及低头综合征和假关节的情况下,明显观察到差异,所有大语言模型都未能与NASS建议保持一致。此外,在颈椎神经根病病例以及作为小关节囊肿切除辅助手段的融合方面,只有25%的大语言模型与NASS指南一致。

结论

该研究强调了改进大语言模型在临床指南方面培训的必要性,并强调了考虑个体患者病例细微差别的重要性。虽然大语言模型有望在颈椎融合决策中提高对指南的遵循率,但其目前的表现表明需要进一步完善并与临床专业知识相结合,以确保为患者提供最佳护理。这项研究有助于理解人工智能在医疗保健中的作用,倡导一种平衡的方法,即利用技术进步同时承认手术决策的复杂性。

相似文献

引用本文的文献

本文引用的文献

7
Lumbar Synovial Cysts-Should You Fuse or Not?腰椎滑膜囊肿——是否应该融合?
Neurosurgery. 2023 May 1;92(5):1013-1020. doi: 10.1227/neu.0000000000002314. Epub 2022 Dec 30.
8
Comorbidities associated with cervical spine degenerative disc disease.与颈椎退行性椎间盘疾病相关的合并症。
J Orthop. 2021 Jul 16;26:98-102. doi: 10.1016/j.jor.2021.07.008. eCollection 2021 Jul-Aug.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验