Ndjonko Laura C M, Chakraborty Aritra, Petri Francesco, Alavi Seyed Mohammad Amin, Matsuo Takahiro, Borgonovo Fabio, Comba Isin Y, Murad Mohammad H, Nassr Ahmad, El-Zein Said, Berbari Elie F
Northwestern University, 633 Clark St, Evanston, IL 60208, USA.
Loyola University Chicago Stritch School of Medicine, 2160 South First Avenue, Maywood, IL 60153.
Spine J. 2025 Jul 14. doi: 10.1016/j.spinee.2025.07.032.
Surgical site infections (SSIs) are a significant complication following spinal surgery. These infections contribute to increased morbidity, prolonged hospital stays, and substantial healthcare costs. Traditional statistical models have been widely used to predict SSI risk, but artificial intelligence (AI) and its machine learning (ML) methods have also been used for SSI prediction.
This systematic review aims to evaluate the predictive accuracy of AI models versus traditional statistical models in assessing SSI risk following spinal surgery.
STUDY DESIGN/SETTING: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.
We searched Medline, Embase, Scopus, Web of Science, and ClinicalTrials.gov. Studies were included if they developed predictive models for SSI following spinal surgery using either AI or traditional statistical approaches. Risk of Bias for all studies was assessed using the Prediction model Risk of Bias Assessment Tool (PROBAST). Predictive model performance was compared using metrics such as the C-statistic and Area under the Receiver Operating Characteristic curve (AUC-ROC).
A total of 51 studies were included. Among these, 42 studies used traditional statistical methods, while 9 used AI / ML models. Logistic regression was the most common method among traditional models (95.2%). Across the ML studies, all of which used supervised models trained on tabular data, decision‑tree-based and linear algorithms (n=7, 77.8% each) were the most common, followed by neural networks and support vector machines (n = 4, 44.4% each). Traditional models achieved a C-statistic between 0.7 and 0.8 in 40.5% of cases (n = 17), with only 4.8% (n = 2) exceeding 0.9. AI models showed a C-statistic of 0.9 or higher in 44.4% of cases (n = 4). However, 77.8% of those ML-based models (n = 7) performed an internal cross validation and only 33.3% reported calibration data (n = 3), and none of the studies are externally validated, which raises important concerns about their current clinical applicability and generalizability.
This systematic review, the first of its kind, observed that studies utilizing the ML models reported a potential for excellent classification accuracy in predicting SSI following spinal surgery. However, the current shortcomings in methodology limit their generalizability and immediate clinical implementation. For existing models, most ML studies remain in the early stages of development and its findings in excellent performance should be taken with caution. This review highlights the need for standardized model benchmarking and employing external validation to reliably assess generalizability. Furthermore, advancing beyond conventional tabular data by incorporating state-of-the art AI models that leverage multi-modal data could significantly expand the potential of predictive analytics in this domain - thus help guide clinical decision making.
手术部位感染(SSIs)是脊柱手术后的一种严重并发症。这些感染会导致发病率增加、住院时间延长以及医疗成本大幅上升。传统统计模型已被广泛用于预测SSI风险,但人工智能(AI)及其机器学习(ML)方法也被用于SSI预测。
本系统评价旨在评估人工智能模型与传统统计模型在评估脊柱手术后SSI风险方面的预测准确性。
研究设计/设置:按照系统评价和Meta分析的首选报告项目(PRISMA)指南进行系统评价。
我们检索了Medline、Embase、Scopus、Web of Science和ClinicalTrials.gov。如果研究使用人工智能或传统统计方法开发脊柱手术后SSI的预测模型,则纳入研究。使用预测模型偏倚风险评估工具(PROBAST)评估所有研究的偏倚风险。使用C统计量和受试者工作特征曲线下面积(AUC-ROC)等指标比较预测模型的性能。
共纳入51项研究。其中,42项研究使用传统统计方法,9项使用人工智能/机器学习模型。逻辑回归是传统模型中最常用的方法(95.2%)。在所有机器学习研究中,均使用基于表格数据训练的监督模型,基于决策树和线性算法(各7项,各占77.8%)最为常见,其次是神经网络和支持向量机(各4项,各占44.4%)。传统模型在40.5%的病例(n = 17)中C统计量在0.7至0.8之间,只有4.8%(n = 2)超过0.9。人工智能模型在44.4%的病例(n = 4)中C统计量达到或高于0.9。然而,77.8%的基于机器学习的模型(n = 7)进行了内部交叉验证,只有33.3%报告了校准数据(n = 3),且没有一项研究进行外部验证,这引发了对其当前临床适用性和可推广性的重要担忧。
本系统评价首次观察到,利用机器学习模型的研究报告了在预测脊柱手术后SSI方面具有出色分类准确性的潜力。然而,目前方法学上的不足限制了它们的可推广性和立即临床应用。对于现有模型,大多数机器学习研究仍处于早期开发阶段,其优异性能的发现应谨慎对待。本评价强调需要进行标准化模型基准测试并采用外部验证来可靠评估可推广性。此外,通过纳入利用多模态数据的先进人工智能模型超越传统表格数据,可能会显著扩展该领域预测分析的潜力——从而有助于指导临床决策。