人工智能与脊柱外科医生在处理有争议的脊柱手术情况中的对比。

AI versus the spinal surgeons in the management of controversial spinal surgery scenarios.

作者信息

Mehmet Saylan, Elmarawany Mohamed Nabil, Harding Ian, Bowey Andrew James, Andrews John, Chan Daniel, Jayasuriya Raveen, Srinivas Shreya, Tomlinson James, Bayley Edward, Grevitt Michael Paul, James Stuart, Jones Alwyn, McCarthy Michael J H

机构信息

Cardiff University, Cardiff, UK.

North Bristol NHS Trust, Bristol, UK.

出版信息

Eur Spine J. 2025 Apr 3. doi: 10.1007/s00586-025-08825-w.

DOI:10.1007/s00586-025-08825-w

PMID:40175645

Abstract

AIMS

The use of artificial intelligence (AI) in spinal surgery is expanding, yet its ability to match the diagnostic and treatment planning accuracy of human surgeons remains unclear. This study aims to compare the performance of AI models-ChatGPT-3.5, ChatGPT-4, and Google Bard-with that of experienced spinal surgeons in controversial spinal scenarios.

METHODS

A questionnaire comprising 54 questions was presented to ten spinal surgeons on two occasions, four weeks apart, to assess consistency. The same questionnaire was also presented to ChatGPT-3.5, ChatGPT-4, and Google Bard, each generating five responses per question. Responses were analyzed for consistency and agreement with human surgeons using Kappa values. Thematic analysis of AI responses identified common themes and evaluated the depth and accuracy of AI recommendations.

RESULTS

Test-retest reliability among surgeons showed Kappa values from 0.535 to 1.00, indicating moderate to perfect reliability. Inter-rater agreement between surgeons and AI models was generally low, with nonsignificant p-values. Fair agreements were observed between surgeons' second occasion responses and ChatGPT-3.5 (Kappa = 0.24) and ChatGPT-4 (Kappa = 0.27). AI responses were detailed and structured, while surgeons provided more concise answers.

CONCLUSIONS

AI large language models are not yet suitable for complex spinal surgery decisions but hold potential for preliminary information gathering and emergency triage. Legal, ethical, and accuracy issues must be addressed before AI can be reliably integrated into clinical practice.

摘要

目的

人工智能（AI）在脊柱外科手术中的应用正在不断扩展，但其在诊断和治疗规划准确性方面与人类外科医生相匹配的能力仍不明确。本研究旨在比较人工智能模型ChatGPT-3.5、ChatGPT-4和谷歌巴德（Google Bard）与经验丰富的脊柱外科医生在有争议的脊柱病例中的表现。

方法

向十位脊柱外科医生分两次发放一份包含54个问题的问卷，两次间隔四周，以评估一致性。同样的问卷也发给了ChatGPT-3.5、ChatGPT-4和谷歌巴德，每个模型针对每个问题生成五个回答。使用卡帕值分析回答的一致性以及与人类外科医生的一致性。对人工智能的回答进行主题分析，确定共同主题并评估人工智能建议的深度和准确性。

结果

外科医生的重测信度显示卡帕值在0.535至1.00之间，表明信度为中度到完美。外科医生与人工智能模型之间的评分者间一致性普遍较低，p值无统计学意义。观察到外科医生第二次回答与ChatGPT-3.5（卡帕值=0.24）和ChatGPT-4（卡帕值=0.27）之间有中等程度的一致性。人工智能的回答详细且有条理，而外科医生提供的答案更简洁。

结论

人工智能大语言模型尚不适合用于复杂的脊柱手术决策，但在初步信息收集和紧急分诊方面具有潜力。在人工智能能够可靠地整合到临床实践之前，必须解决法律、伦理和准确性问题。

相似文献

AI versus the spinal surgeons in the management of controversial spinal surgery scenarios.人工智能与脊柱外科医生在处理有争议的脊柱手术情况中的对比。

Eur Spine J. 2025 Apr 3. doi: 10.1007/s00586-025-08825-w.

Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis.评估 ChatGPT 和 Bard 生成的结构化摘要与脊柱外科领域人类撰写的摘要在可重复性方面的比较：对比分析。

J Med Internet Res. 2024 Jun 26;26:e52001. doi: 10.2196/52001.

Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.利用人工智能在减重手术中的应用：ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。

Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.

Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人：可靠的生命线还是风险？

Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?“AI 医生为您服务”：ChatGPT-4 的治疗建议与骨科临床实践指南如何契合？

Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.

Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study.ChatGPT-3.5、ChatGPT-4o、Copilot、Gemini、Claude和Perplexity在依据临床实践指南对腰骶神经根性疼痛提供建议方面的准确性：横断面研究

Front Digit Health. 2025 Jun 27;7:1574287. doi: 10.3389/fdgth.2025.1574287. eCollection 2025.

Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions.人工智能在减重手术中的表现：ChatGPT-4、Bing 和 Bard 在《美国代谢与减重外科学会减重手术教科书》减重手术问题中的比较分析。

Surg Obes Relat Dis. 2024 Jul;20(7):609-613. doi: 10.1016/j.soard.2024.04.014. Epub 2024 May 8.

Evaluation of artificial ıntelligence use in ankylosing spondylitis with ChatGPT-4: patient and physician perspectives.使用ChatGPT-4评估人工智能在强直性脊柱炎中的应用：患者和医生的观点。

Clin Rheumatol. 2025 Sep 11. doi: 10.1007/s10067-025-07648-w.

Comparing Artificial Intelligence and Senior Residents in Oral Lesion Diagnosis: A Comparative Study.人工智能与住院医师在口腔病变诊断中的比较：一项对比研究。

Cureus. 2024 Jan 3;16(1):e51584. doi: 10.7759/cureus.51584. eCollection 2024 Jan.

引用本文的文献

Answer to the letter to the editor of X. Zhang, et al. concerning "AI versus the spinal surgeons in the management of controversial spinal surgery scenarios" by Mehmet, S. et al. (Eur spine J [2025]: doi.org/10.1007/s00586-025-08825-w).对张X等人致编辑信的回复，该信涉及Mehmet, S.等人发表的“人工智能与脊柱外科医生在处理有争议的脊柱手术情况中的比较”（《欧洲脊柱杂志》[2025]：doi.org/10.1007/s00586-025-08825-w）

Eur Spine J. 2025 Jul;34(7):3054-3055. doi: 10.1007/s00586-025-08938-2. Epub 2025 May 28.

Letter to the Editor concerning "AI versus the spinal surgeons in the management of controversial spinal surgery scenarios" by Mehmet, S. et al. (Eur spine J [2025]: doi.org/10.1007/s00586-025-08825-w).致编辑的信，关于梅赫梅特等人发表的《人工智能与脊柱外科医生在处理有争议的脊柱手术情况中的比较》（《欧洲脊柱杂志》[2025]：doi.org/10.1007/s00586-025-08825-w）

Eur Spine J. 2025 May 17. doi: 10.1007/s00586-025-08932-8.

本文引用的文献

Thromboembolic prophylaxis in spine surgery: an analysis of ChatGPT recommendations.脊柱手术中的血栓栓塞预防：对ChatGPT推荐意见的分析

Spine J. 2023 Nov;23(11):1684-1691. doi: 10.1016/j.spinee.2023.07.015. Epub 2023 Jul 25.

AI chatbots not yet ready for clinical use.人工智能聊天机器人尚未准备好用于临床。

Front Digit Health. 2023 Apr 12;5:1161098. doi: 10.3389/fdgth.2023.1161098. eCollection 2023.

Using Machine Learning and Deep Learning Algorithms to Predict Postoperative Outcomes Following Anterior Cervical Discectomy and Fusion.使用机器学习和深度学习算法预测前路颈椎间盘切除融合术后的术后结果。

Clin Spine Surg. 2023 Apr 1;36(3):143-149. doi: 10.1097/BSD.0000000000001443. Epub 2023 Mar 13.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Changing the Game: Spine Care in the Era of Artificial Intelligence and Deep Learning Algorithms.改变游戏规则：人工智能和深度学习算法时代的脊柱护理

Global Spine J. 2020 Apr;10(2):117. doi: 10.1177/2192568219899944. Epub 2020 Mar 12.

Artificial intelligence, bias and clinical safety.人工智能、偏差与临床安全。

BMJ Qual Saf. 2019 Mar;28(3):231-237. doi: 10.1136/bmjqs-2018-008370. Epub 2019 Jan 12.

Dermatologist-level classification of skin cancer with deep neural networks.基于深度神经网络的皮肤癌皮肤科医生级分类。

Nature. 2017 Feb 2;542(7639):115-118. doi: 10.1038/nature21056. Epub 2017 Jan 25.

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.深度学习算法在视网膜眼底照片糖尿病视网膜病变检测中的开发与验证。

JAMA. 2016 Dec 13;316(22):2402-2410. doi: 10.1001/jama.2016.17216.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能与脊柱外科医生在处理有争议的脊柱手术情况中的对比。

AI versus the spinal surgeons in the management of controversial spinal surgery scenarios.

作者信息

机构信息

出版信息

AIMS

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献