文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

ChatGPT 与 NASS 退行性脊柱滑脱临床指南比较分析。

ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis.

机构信息

Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Chicago Medical School at Rosalind Franklin University, North Chicago, IL, USA.

出版信息

Eur Spine J. 2024 Nov;33(11):4182-4203. doi: 10.1007/s00586-024-08198-6. Epub 2024 Mar 15.


DOI:10.1007/s00586-024-08198-6
PMID:38489044
Abstract

BACKGROUND CONTEXT: Clinical guidelines, developed in concordance with the literature, are often used to guide surgeons' clinical decision making. Recent advancements of large language models and artificial intelligence (AI) in the medical field come with exciting potential. OpenAI's generative AI model, known as ChatGPT, can quickly synthesize information and generate responses grounded in medical literature, which may prove to be a useful tool in clinical decision-making for spine care. The current literature has yet to investigate the ability of ChatGPT to assist clinical decision making with regard to degenerative spondylolisthesis. PURPOSE: The study aimed to compare ChatGPT's concordance with the recommendations set forth by The North American Spine Society (NASS) Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and assess ChatGPT's accuracy within the context of the most recent literature. METHODS: ChatGPT-3.5 and 4.0 was prompted with questions from the NASS Clinical Guideline for the Diagnosis and Treatment of Degenerative Spondylolisthesis and graded its recommendations as "concordant" or "nonconcordant" relative to those put forth by NASS. A response was considered "concordant" when ChatGPT generated a recommendation that accurately reproduced all major points made in the NASS recommendation. Any responses with a grading of "nonconcordant" were further stratified into two subcategories: "Insufficient" or "Over-conclusive," to provide further insight into grading rationale. Responses between GPT-3.5 and 4.0 were compared using Chi-squared tests. RESULTS: ChatGPT-3.5 answered 13 of NASS's 28 total clinical questions in concordance with NASS's guidelines (46.4%). Categorical breakdown is as follows: Definitions and Natural History (1/1, 100%), Diagnosis and Imaging (1/4, 25%), Outcome Measures for Medical Intervention and Surgical Treatment (0/1, 0%), Medical and Interventional Treatment (4/6, 66.7%), Surgical Treatment (7/14, 50%), and Value of Spine Care (0/2, 0%). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-3.5 generated a concordant response 66.7% of the time (6/9). However, ChatGPT-3.5's concordance dropped to 36.8% when asked clinical questions that NASS did not provide a clear recommendation on (7/19). A further breakdown of ChatGPT-3.5's nonconcordance with the guidelines revealed that a vast majority of its inaccurate recommendations were due to them being "over-conclusive" (12/15, 80%), rather than "insufficient" (3/15, 20%). ChatGPT-4.0 answered 19 (67.9%) of the 28 total questions in concordance with NASS guidelines (P = 0.177). When NASS indicated there was sufficient evidence to offer a clear recommendation, ChatGPT-4.0 generated a concordant response 66.7% of the time (6/9). ChatGPT-4.0's concordance held up at 68.4% when asked clinical questions that NASS did not provide a clear recommendation on (13/19, P = 0.104). CONCLUSIONS: This study sheds light on the duality of LLM applications within clinical settings: one of accuracy and utility in some contexts versus inaccuracy and risk in others. ChatGPT was concordant for most clinical questions NASS offered recommendations for. However, for questions NASS did not offer best practices, ChatGPT generated answers that were either too general or inconsistent with the literature, and even fabricated data/citations. Thus, clinicians should exercise extreme caution when attempting to consult ChatGPT for clinical recommendations, taking care to ensure its reliability within the context of recent literature.

摘要

背景:临床指南是根据文献制定的,常用于指导外科医生的临床决策。最近,大型语言模型和人工智能(AI)在医学领域的发展带来了令人兴奋的潜力。OpenAI 的生成式 AI 模型 ChatGPT 可以快速综合信息并生成基于医学文献的回复,这可能成为脊柱护理临床决策的有用工具。目前的文献尚未探讨 ChatGPT 协助退行性脊椎滑脱诊断和治疗的临床决策的能力。

目的:本研究旨在比较 ChatGPT 与北美脊柱学会(NASS)退行性脊椎滑脱诊断和治疗临床指南的建议的一致性,并评估 ChatGPT 在最新文献背景下的准确性。

方法:ChatGPT-3.5 和 4.0 被提示了 NASS 退行性脊椎滑脱诊断和治疗临床指南中的问题,并根据 NASS 的建议将其建议评为“一致”或“不一致”。当 ChatGPT 生成的建议准确复制了 NASS 建议中的所有主要观点时,该回复被认为是“一致”的。任何评分“不一致”的回复都进一步细分为“不充分”或“过度结论”两个子类别,以提供对评分原理的进一步洞察。使用卡方检验比较 GPT-3.5 和 4.0 之间的回复。

结果:ChatGPT-3.5 与 NASS 的 28 个临床问题中的 13 个问题一致(46.4%)。分类如下:定义和自然史(1/1,100%)、诊断和影像学(1/4,25%)、医学干预和手术治疗的结果测量(0/1,0%)、医学和介入治疗(4/6,66.7%)、手术治疗(7/14,50%)和脊柱护理的价值(0/2,0%)。当 NASS 表示有足够的证据提供明确的建议时,ChatGPT-3.5 以 66.7%的概率生成一致的回复(6/9)。然而,当被问及 NASS 没有提供明确建议的临床问题时,ChatGPT-3.5 的一致性降至 36.8%(7/19)。ChatGPT-3.5 与指南不一致的进一步细分表明,其不准确建议的绝大多数是由于它们“过度结论”(12/15,80%),而不是“不充分”(3/15,20%)。ChatGPT-4.0 与 NASS 指南的 28 个问题中的 19 个(67.9%)一致(P=0.177)。当 NASS 表示有足够的证据提供明确的建议时,ChatGPT-4.0 以 66.7%的概率生成一致的回复(6/9)。当被问及 NASS 没有提供明确建议的临床问题时,ChatGPT-4.0 的一致性保持在 68.4%(13/19,P=0.104)。

结论:本研究揭示了大型语言模型在临床环境中的应用的双重性:在某些情况下准确性和实用性,而在其他情况下则是不准确性和风险。ChatGPT 与 NASS 提供建议的大多数临床问题一致。然而,对于 NASS 没有提供最佳实践的问题,ChatGPT 生成的答案要么过于笼统,要么与文献不一致,甚至编造数据/引用。因此,临床医生在尝试使用 ChatGPT 获得临床建议时应格外小心,务必确保其在最新文献背景下的可靠性。

相似文献

[1]
ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis.

Eur Spine J. 2024-11

[2]
Guideline summary review: An evidence-based clinical guideline for the diagnosis and treatment of degenerative lumbar spondylolisthesis.

Spine J. 2016-3

[3]
Performance of ChatGPT on NASS Clinical Guidelines for the Diagnosis and Treatment of Low Back Pain: A Comparison Study.

Spine (Phila Pa 1976). 2024-5-1

[4]
ChatGPT and its Role in the Decision-Making for the Diagnosis and Treatment of Lumbar Spinal Stenosis: A Comparative Analysis and Narrative Review.

Global Spine J. 2024-4

[5]
Thromboembolic prophylaxis in spine surgery: an analysis of ChatGPT recommendations.

Spine J. 2023-11

[6]
An evidence-based clinical guideline for the diagnosis and treatment of degenerative lumbar spondylolisthesis.

Spine J. 2009-7

[7]
An analysis of ChatGPT recommendations for the diagnosis and treatment of cervical radiculopathy.

J Neurosurg Spine. 2024-9-1

[8]
Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison.

Neurospine. 2024-3

[9]
Guideline summary review: an evidence-based clinical guideline for the diagnosis and treatment of adult isthmic spondylolisthesis.

Spine J. 2016-12

[10]
Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery.

Neurospine. 2024-3

引用本文的文献

[1]
Assessing LLMs on IDSA Practice Guidelines for the Diagnosis and Treatment of Native Vertebral Osteomyelitis: A Comparison Study.

J Clin Med. 2025-7-15

[2]
Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study.

Front Digit Health. 2025-6-27

[3]
Large Language Models in Spine Surgery: A Promising Technology.

HSS J. 2025-5-29

[4]
Letter to the editor concerning "ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis" by Ahmed W, et al. (Eur spine J [2024]: doi:10.1007/s00586-024-08198-6).

Eur Spine J. 2025-5-17

[5]
A cross-sectional study on ChatGPT's alignment with clinical practice guidelines in musculoskeletal rehabilitation.

BMC Musculoskelet Disord. 2025-4-24

[6]
A comparative analysis between ChatGPT versus NASS clinical guidelines for adult isthmic spondylolisthesis.

N Am Spine Soc J. 2025-2-22

[7]
Evaluation of Large Language Models' Concordance With Guidelines on Olfaction.

Laryngoscope Investig Otolaryngol. 2025-3-22

[8]
Can generative artificial intelligence provide accurate medical advice?: a case of ChatGPT versus Congress of Neurological Surgeons management of acute cervical spine and spinal cord injuries clinical guidelines.

Asian Spine J. 2025-3-4

[9]
Evaluating Artificial Intelligence in Spinal Cord Injury Management: A Comparative Analysis of ChatGPT-4o and Google Gemini Against American College of Surgeons Best Practices Guidelines for Spine Injury.

Global Spine J. 2025-2-17

[10]
Evaluation of GPT-4 concordance with north American spine society guidelines for lumbar fusion surgery.

N Am Spine Soc J. 2024-12-27

本文引用的文献

[1]
Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.

Cureus. 2023-2-19

[2]
ChatGPT and the Future of Medical Writing.

Radiology. 2023-4

[3]
Systematic Review of Cost-Effectiveness Analyses Comparing Open and Minimally Invasive Lumbar Spinal Surgery.

Int J Spine Surg. 2022-7-14

[4]
[The use of underwater horizontal traction and mechanotherapy in the complex treatment of degenerative spondylolisthesis of the lumbosacral spine: a pilot clinical study].

Vopr Kurortol Fizioter Lech Fiz Kult. 2022

[5]
Comparison of decompression, decompression plus fusion, and decompression plus stabilization: a long-term follow-up of a prospective, randomized study.

Spine J. 2022-5

[6]
MIS-TLIF versus O-TLIF for single-level degenerative stenosis: study protocol for randomised controlled trial.

BMJ Open. 2021-3-5

[7]
Predicting spondylolisthesis correction with prone traction radiographs.

Bone Joint J. 2020-8

[8]
The clinical efficacy of Shi-style lumbar manipulations for symptomatic degenerative lumbar spondylolisthesis: protocol for a randomized, blinded, controlled trial.

J Orthop Surg Res. 2019-6-14

[9]
Cost-Effectiveness for Surgical Treatment of Degenerative Spondylolisthesis.

Neurosurg Clin N Am. 2019-7

[10]
Surgical Versus Nonsurgical Treatment of Lumbar Spondylolisthesis.

Neurosurg Clin N Am. 2019-7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索