Suppr超能文献

人工智能与脊柱外科医生在处理有争议的脊柱手术情况中的对比。

AI versus the spinal surgeons in the management of controversial spinal surgery scenarios.

作者信息

Mehmet Saylan, Elmarawany Mohamed Nabil, Harding Ian, Bowey Andrew James, Andrews John, Chan Daniel, Jayasuriya Raveen, Srinivas Shreya, Tomlinson James, Bayley Edward, Grevitt Michael Paul, James Stuart, Jones Alwyn, McCarthy Michael J H

机构信息

Cardiff University, Cardiff, UK.

North Bristol NHS Trust, Bristol, UK.

出版信息

Eur Spine J. 2025 Apr 3. doi: 10.1007/s00586-025-08825-w.

Abstract

AIMS

The use of artificial intelligence (AI) in spinal surgery is expanding, yet its ability to match the diagnostic and treatment planning accuracy of human surgeons remains unclear. This study aims to compare the performance of AI models-ChatGPT-3.5, ChatGPT-4, and Google Bard-with that of experienced spinal surgeons in controversial spinal scenarios.

METHODS

A questionnaire comprising 54 questions was presented to ten spinal surgeons on two occasions, four weeks apart, to assess consistency. The same questionnaire was also presented to ChatGPT-3.5, ChatGPT-4, and Google Bard, each generating five responses per question. Responses were analyzed for consistency and agreement with human surgeons using Kappa values. Thematic analysis of AI responses identified common themes and evaluated the depth and accuracy of AI recommendations.

RESULTS

Test-retest reliability among surgeons showed Kappa values from 0.535 to 1.00, indicating moderate to perfect reliability. Inter-rater agreement between surgeons and AI models was generally low, with nonsignificant p-values. Fair agreements were observed between surgeons' second occasion responses and ChatGPT-3.5 (Kappa = 0.24) and ChatGPT-4 (Kappa = 0.27). AI responses were detailed and structured, while surgeons provided more concise answers.

CONCLUSIONS

AI large language models are not yet suitable for complex spinal surgery decisions but hold potential for preliminary information gathering and emergency triage. Legal, ethical, and accuracy issues must be addressed before AI can be reliably integrated into clinical practice.

摘要

目的

人工智能(AI)在脊柱外科手术中的应用正在不断扩展,但其在诊断和治疗规划准确性方面与人类外科医生相匹配的能力仍不明确。本研究旨在比较人工智能模型ChatGPT-3.5、ChatGPT-4和谷歌巴德(Google Bard)与经验丰富的脊柱外科医生在有争议的脊柱病例中的表现。

方法

向十位脊柱外科医生分两次发放一份包含54个问题的问卷,两次间隔四周,以评估一致性。同样的问卷也发给了ChatGPT-3.5、ChatGPT-4和谷歌巴德,每个模型针对每个问题生成五个回答。使用卡帕值分析回答的一致性以及与人类外科医生的一致性。对人工智能的回答进行主题分析,确定共同主题并评估人工智能建议的深度和准确性。

结果

外科医生的重测信度显示卡帕值在0.535至1.00之间,表明信度为中度到完美。外科医生与人工智能模型之间的评分者间一致性普遍较低,p值无统计学意义。观察到外科医生第二次回答与ChatGPT-3.5(卡帕值=0.24)和ChatGPT-4(卡帕值=0.27)之间有中等程度的一致性。人工智能的回答详细且有条理,而外科医生提供的答案更简洁。

结论

人工智能大语言模型尚不适合用于复杂的脊柱手术决策,但在初步信息收集和紧急分诊方面具有潜力。在人工智能能够可靠地整合到临床实践之前,必须解决法律、伦理和准确性问题。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验