Najjar Elie, Abdelazim Hassan Ahmed, Muscogliati Rodrigo, Salem Khalid M, Quraishi Nasir A
Centre for Spinal Studies and Surgery, Queens Medical Centre, Nottingham University Hospitals NHS Trust, Nottingham, United Kingdom.
Centre for Spinal Studies and Surgery, Queens Medical Centre, Nottingham University Hospitals NHS Trust, Nottingham, United Kingdom; Department of Orthopedics and Trauma Surgery, Assiut University School of Medicine, Assiut, Egypt.
Spine J. 2025 May 8. doi: 10.1016/j.spinee.2025.05.026.
Cauda Equina Syndrome (CES) is a spine surgical urgency requiring prompt intervention to prevent neurological deficits. Accurate identification of CES cases needing urgent surgery is essential to avoid long-term sequelae.
To evaluate the concordance between an AI language model (ChatGPT) and a Spinal Multidisciplinary Team (MDT) in recommending surgical intervention for suspected CES cases.
STUDY DESIGN/SETTING: Retrospective concordance analysis comparing surgical recommendations between ChatGPT and a Spinal MDT.
Among 160 referrals presenting with red flags for possible CES, 10 cases were used to calibrate ChatGPT to specific clinical and diagnostic parameters, with the remaining 150 cases included in the primary analysis. The average patient age was 50.6 years (range 18-87), with a male-to-female ratio of 68:82.
The primary outcome was the concordance rate between ChatGPT and the MDT in recommending surgery, evaluated through agreement rates and statistical analysis.
Each of the 150 cases was presented as standardized slides including clinical history, imaging, and examination findings. Both the MDT and ChatGPT assessed the need for urgent surgery. Discordant cases (n=17) were further reviewed by 3 spinal surgeons blinded to prior decisions.
ChatGPT and the MDT agreed on surgical recommendations in 133 out of 150 cases, achieving an 88.7% concordance (Cohen's Kappa = 0.764, p<.001). ChatGPT recommended surgery more frequently in the 17 discordant cases, but this difference was not statistically significant (McNemar's test statistic = 1.23, p=.46). Review by 3 independent surgeons reached consensus on 11 of the 17 discordant cases (64.7%), highlighting variability among experts; individual surgeons aligned with ChatGPT in 5 to 6 cases each (29.4%-35.3%).
Substantial agreement between ChatGPT and the MDT suggests ChatGPT's comparable sensitivity in detecting surgical candidates in CES cases. Variability among surgeons on discordant cases underscores subjectivity in CES triage. ChatGPT may be a valuable adjunct in high-stakes clinical decision-making, though further validation and refinement are needed.
马尾综合征(CES)是一种脊柱外科急症,需要迅速干预以预防神经功能缺损。准确识别需要紧急手术的CES病例对于避免长期后遗症至关重要。
评估人工智能语言模型(ChatGPT)与脊柱多学科团队(MDT)在推荐疑似CES病例的手术干预方面的一致性。
研究设计/设置:比较ChatGPT与脊柱MDT之间手术建议的回顾性一致性分析。
在160例出现可能为CES红旗症状的转诊病例中,10例用于根据特定临床和诊断参数校准ChatGPT,其余150例纳入主要分析。患者平均年龄为50.6岁(范围18 - 87岁),男女比例为68:82。
主要观察指标是ChatGPT与MDT在推荐手术方面的一致性率,通过一致率和统计分析进行评估。
150例病例中的每一例均以标准化幻灯片形式呈现,包括临床病史、影像学和检查结果。MDT和ChatGPT均评估了紧急手术的必要性。17例不一致的病例由3名对先前决策不知情的脊柱外科医生进行进一步审查。
在150例病例中,ChatGPT和MDT在133例手术建议上达成一致,一致性为88.7%(科恩kappa系数 = 0.764,p <.001)。在17例不一致的病例中,ChatGPT更频繁地推荐手术,但这种差异无统计学意义(麦克尼马尔检验统计量 = 1.23,p = 0.46)。3名独立外科医生对17例不一致病例中的11例(64.7%)达成了共识,凸显了专家之间的变异性;每位外科医生在5至6例病例中与ChatGPT意见一致(29.4% - 35.3%)。
ChatGPT与MDT之间的高度一致性表明ChatGPT在检测CES病例中的手术候选者方面具有相当的敏感性。外科医生在不一致病例上的变异性强调了CES分诊中的主观性。ChatGPT可能是高风险临床决策中有价值的辅助工具,不过还需要进一步验证和完善。