• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大语言模型与脊柱外科医生在脊柱疾病手术决策和放射学评估中的比较分析

Comparative Analysis of Large Language Models and Spine Surgeons in Surgical Decision-Making and Radiological Assessment for Spine Pathologies.

作者信息

Almekkawi Ahmad K, Caruso James P, Anand Soummitra, Hawkins Angela M, Rauf Rayaan, Al-Shaikhli Mayar, Aoun Salah G, Bagley Carlos A

机构信息

Saint Luke's Marion Bloch Neuroscience Institute Department of Neurosurgery, Kansas City, Missouri, USA.

The University of Texas Southwestern Department of Neurosurgery, Dallas, Texas, USA.

出版信息

World Neurosurg. 2025 Feb;194:123531. doi: 10.1016/j.wneu.2024.11.114. Epub 2024 Dec 23.

DOI:10.1016/j.wneu.2024.11.114
PMID:39622288
Abstract

OBJECTIVE

This study aimed to investigate the accuracy of large language models (LLMs), specifically ChatGPT and Claude, in surgical decision-making and radiological assessment for spine pathologies compared to experienced spine surgeons.

METHODS

The study employed a comparative analysis between the LLMs and a panel of attending spine surgeons. Five written clinical scenarios encompassing various spine pathologies were presented to the LLMs and surgeons, who provided recommended surgical treatment plans. Additionally, magnetic resonance imaging images depicting spine pathologies were analyzed by the LLMs and surgeons to assess their radiological interpretation abilities. Spino-pelvic parameters were estimated from a scoliosis radiograph by the LLMs.

RESULTS

Qualitative content analysis revealed limitations in the LLMs' consideration of patient-specific factors and the breadth of treatment options. Both ChatGPT and Claude provided detailed descriptions of magnetic resonance imaging findings but differed from the surgeons in terms of specific levels and severity of pathologies. The LLMs acknowledged the limitations of accurately measuring spino-pelvic parameters without specialized tools. The accuracy of surgical decision-making for the LLMs (20%) was lower than that of the attending surgeons (100%). Statistical analysis showed no significant differences in accuracy between the groups.

CONCLUSIONS

The study highlights the potential of LLMs in assisting with radiological interpretation and surgical decision-making in spine surgery. However, the current limitations, such as the lack of consideration for patient-specific factors and inaccuracies in treatment recommendations, emphasize the need for further refinement and validation of these artificial intelligence (AI) models. Continued collaboration between AI researchers and clinical experts is crucial to address these challenges and realize the full potential of AI in spine surgery.

摘要

目的

本研究旨在调查大语言模型(LLMs),特别是ChatGPT和Claude,在脊柱疾病手术决策和放射学评估方面与经验丰富的脊柱外科医生相比的准确性。

方法

该研究对大语言模型和一组脊柱外科主治医师进行了对比分析。向大语言模型和外科医生呈现了五个包含各种脊柱疾病的书面临床场景,他们提供了推荐的手术治疗方案。此外,大语言模型和外科医生对描绘脊柱疾病的磁共振成像图像进行了分析,以评估他们的放射学解读能力。大语言模型从一张脊柱侧弯X光片中估计了脊柱骨盆参数。

结果

定性内容分析揭示了大语言模型在考虑患者特定因素和治疗选择广度方面的局限性。ChatGPT和Claude都提供了磁共振成像结果的详细描述,但在疾病的具体节段和严重程度方面与外科医生不同。大语言模型承认在没有专门工具的情况下准确测量脊柱骨盆参数存在局限性。大语言模型手术决策的准确性(20%)低于主治医师(100%)。统计分析表明两组之间在准确性上没有显著差异。

结论

该研究突出了大语言模型在脊柱手术的放射学解读和手术决策辅助方面的潜力。然而,当前的局限性,如缺乏对患者特定因素的考虑和治疗建议的不准确,强调了对这些人工智能(AI)模型进行进一步完善和验证的必要性。AI研究人员和临床专家之间的持续合作对于应对这些挑战并实现AI在脊柱手术中的全部潜力至关重要。

相似文献

1
Comparative Analysis of Large Language Models and Spine Surgeons in Surgical Decision-Making and Radiological Assessment for Spine Pathologies.大语言模型与脊柱外科医生在脊柱疾病手术决策和放射学评估中的比较分析
World Neurosurg. 2025 Feb;194:123531. doi: 10.1016/j.wneu.2024.11.114. Epub 2024 Dec 23.
2
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.使用大语言模型对黄蜂蜇伤进行临床管理:横断面评估研究
J Med Internet Res. 2025 Jun 4;27:e67489. doi: 10.2196/67489.
3
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
4
Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.错误的恶臭还是潜力的光辉:言语病理学中(不)负责任地使用ChatGPT的挑战。
Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.
5
Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers.大型语言模型在数值与语义医学知识方面的表现:基于循证问答的横断面基准研究
J Med Internet Res. 2025 Jul 14;27:e64452. doi: 10.2196/64452.
6
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
7
"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?“AI 医生为您服务”:ChatGPT-4 的治疗建议与骨科临床实践指南如何契合?
Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.
8
Comparative analysis of language models in addressing syphilis-related queries.语言模型在处理梅毒相关查询方面的比较分析。
Med Oral Patol Oral Cir Bucal. 2025 Jul 1;30(4):e551-e560. doi: 10.4317/medoral.27092.
9
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
10
Accuracy of large language models in generating differential diagnosis from clinical presentation and imaging findings in pediatric cases.大型语言模型根据儿科病例的临床表现和影像学检查结果生成鉴别诊断的准确性。
Pediatr Radiol. 2025 Jul 12. doi: 10.1007/s00247-025-06317-z.

引用本文的文献

1
Current trends and future prospects of language models and processing systems in spine surgery - a scoping review.脊柱手术中语言模型和处理系统的当前趋势与未来前景——一项范围综述
Neurosurg Rev. 2025 Sep 5;48(1):633. doi: 10.1007/s10143-025-03785-7.
2
Evaluation of the performance of large language models in endoscopic lumbar surgery: a comparative analysis.大型语言模型在内镜腰椎手术中的性能评估:一项比较分析。
Ann Med Surg (Lond). 2025 Jun 30;87(8):4835-4840. doi: 10.1097/MS9.0000000000003519. eCollection 2025 Aug.
3
Letter to the Editor concerning "AI versus the spinal surgeons in the management of controversial spinal surgery scenarios" by Mehmet, S. et al. (Eur spine J [2025]: doi.org/10.1007/s00586-025-08825-w).
致编辑的信,关于梅赫梅特等人发表的《人工智能与脊柱外科医生在处理有争议的脊柱手术情况中的比较》(《欧洲脊柱杂志》[2025]:doi.org/10.1007/s00586-025-08825-w)
Eur Spine J. 2025 May 17. doi: 10.1007/s00586-025-08932-8.
4
Artificial intelligence in neurovascular decision-making: a comparative analysis of ChatGPT-4 and multidisciplinary expert recommendations for unruptured intracranial aneurysms.人工智能在神经血管决策中的应用:ChatGPT-4与颅内未破裂动脉瘤多学科专家建议的比较分析
Neurosurg Rev. 2025 Feb 21;48(1):261. doi: 10.1007/s10143-025-03341-3.