Suppr超能文献

大型语言模型在生成脊柱手术抗生素预防临床指南方面的表现。

Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery.

作者信息

Zaidat Bashar, Shrestha Nancy, Rosenberg Ashley M, Ahmed Wasil, Rajjoub Rami, Hoang Timothy, Mejia Mateo Restrepo, Duey Akiro H, Tang Justin E, Kim Jun S, Cho Samuel K

机构信息

Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, Korea.

出版信息

Neurospine. 2024 Mar;21(1):128-146. doi: 10.14245/ns.2347310.655. Epub 2024 Mar 31.

Abstract

OBJECTIVE

Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT's 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines.

METHODS

ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy.

RESULTS

Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT's GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response.

CONCLUSION

ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model's performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model's responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.

摘要

目的

大型语言模型,如聊天生成预训练变换器(ChatGPT),在简化医疗流程和协助医生进行临床决策方面具有巨大潜力。本研究旨在通过比较ChatGPT的两种模型(GPT-3.5和GPT-4.0)对脊柱手术抗生素预防的回答与公认的临床指南,评估其支持临床决策的潜力。

方法

向ChatGPT模型提出来自北美脊柱协会(NASS)《多学科脊柱护理循证临床指南:脊柱手术抗生素预防》(2013年)中的问题。然后比较并评估其回答的准确性。

结果

在16个关于抗生素预防的NASS指南问题中,ChatGPT的GPT-3.5模型有10个回答(62.5%)准确,GPT-4.0有13个回答(81%)准确。GPT-3.5的回答中有25%被认为过于自信,而GPT-4.0的回答中有62.5%直接将NASS指南作为其回答的依据。

结论

ChatGPT表现出了令人印象深刻的准确回答临床问题的能力。GPT-3.5模型的表现受到其给出过于自信回答的倾向以及无法识别其回答中最重要元素的限制。GPT-4.0模型的回答准确性更高,并多次将NASS指南作为直接依据。虽然GPT-4.0仍远非完美,但与GPT-3.5相比,它已显示出提取最相关现有研究的卓越能力。因此,虽然ChatGPT已显示出深远潜力,但目前在其临床应用方面仍应谨慎审查。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31c6/10992653/22dac55b3080/ns-2347310-655f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验