Li Tenghui, Qi Weihui, Mao Xinning, Jia Gaoyong, Zhang Wei, Li Xiaofeng, Pan Hao, Wang Dong
Department of Orthopaedics, Hangzhou Traditional Chinese Medicine Hospital Affiliated to Zhejiang Chinese Medical University, No. 453 Tiyuchang Rd, Hangzhou 310007, Zhejiang, China.
Department of Orthopaedics, Hangzhou Traditional Chinese Medicine Hospital Affiliated to Zhejiang Chinese Medical University, No. 453 Tiyuchang Rd, Hangzhou 310007, Zhejiang, China; Department of Orthopaedics, Hangzhou Dingqiao Hospital, No. 1630 Huanding Rd, Hangzhou 310021, Zhejiang, China.
Spine J. 2025 Apr 9. doi: 10.1016/j.spinee.2025.04.004.
The paraspinal muscles play a critical role in maintaining lumbar spine stability, and different muscles may have varying impacts on lumbar disc degeneration (LDD). However, studies exploring these relationships remain relatively limited.
This study aimed to investigate the relationship between various paravertebral muscles and LDD and to develop and validate a predictive model for LDD using machine learning (ML).
Retrospective cohort study.
A retrospective analysis was performed on hospitalized patients who underwent computed tomography (CT) and magnetic resonance imaging (MRI) examinations for chronic low back pain from February 2018 to January 2023.
The primary outcome measures included model performance metrics such as receiver operating characteristic (ROC) curves, accuracy, sensitivity, specificity, F1 score, positive predictive value (PPV), negative predictive value (NPV), and calibration curves. Clinical decision-making benefits were assessed using decision curve analysis (DCA). Secondary outcome measures focused on model interpretability, evaluated through SHapley Additive exPlanations (SHAP), which identified key predictors and quantified their contributions to LDD prediction.
This study enrolled 518 patients as the internal cohort, who were randomly assigned to a training set (70%) and a test set (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to mitigate class imbalance in the training set. Model parameters were optimized using grid search and 10-fold cross-validation to develop four machine learning models: Extreme Gradient Boosting (XGBoost), Random Forest (RF), Logistic Regression (LR), and Decision Tree (DT). External validation was performed using data from 343 patients from different tertiary medical centers. Paraspinal muscle parameters on lumbar spine CT and MRI images were measured using ImageJ, and LDD was evaluated based on the Pfirrmann grading system. Spearman correlation analysis and logistic regression were performed to assess factors associated with LDD. Model performance was evaluated using metrics such as ROC curves, accuracy, sensitivity, F1 score, PPV, NPV, calibration curves, and DCA. The SHAP method was employed to interpret the ML models.
This study included a total of 861 patients for analysis. In the external validation cohort, the XGBoost model demonstrated the best performance, achieving an AUC of 0.880 (95% CI: 0.826-0.935). Its accuracy (0.819), specificity (0.841), and positive predictive value (PPV=0.958) outperformed other models. Notably, it also exhibited superior sensitivity (0.814) and F1-score (0.880). SHAP analysis further revealed that age, the psoas muscle index (PMI), and the functional cross-sectional area (fCSA) of the multifidus muscle were critical predictors of LDD.
In this study, an LDD prediction model was developed using paravertebral muscle quantitative data and ML algorithms, with SHAP analysis incorporated to enhance model interpretability. The XGBoost model demonstrated the best predictive performance and holds potential to guide early clinical prevention and treatment.
椎旁肌在维持腰椎稳定性方面起着关键作用,不同的肌肉对腰椎间盘退变(LDD)可能有不同的影响。然而,探索这些关系的研究仍然相对有限。
本研究旨在探讨各种椎旁肌与LDD之间的关系,并使用机器学习(ML)开发和验证LDD的预测模型。
回顾性队列研究。
对2018年2月至2023年1月因慢性下腰痛接受计算机断层扫描(CT)和磁共振成像(MRI)检查的住院患者进行回顾性分析。
主要结局指标包括模型性能指标,如受试者操作特征(ROC)曲线、准确性、敏感性、特异性、F1分数、阳性预测值(PPV)、阴性预测值(NPV)和校准曲线。使用决策曲线分析(DCA)评估临床决策效益。次要结局指标侧重于模型可解释性,通过SHapley加性解释(SHAP)进行评估,该方法确定关键预测因素并量化它们对LDD预测的贡献。
本研究纳入518例患者作为内部队列,并随机分为训练集(70%)和测试集(30%)。应用合成少数过采样技术(SMOTE)来缓解训练集中的类别不平衡问题。使用网格搜索和10折交叉验证对模型参数进行优化,以开发四种机器学习模型:极端梯度提升(XGBoost)、随机森林(RF)、逻辑回归(LR)和决策树(DT)。使用来自不同三级医疗中心343例患者的数据进行外部验证。使用ImageJ测量腰椎CT和MRI图像上的椎旁肌参数,并根据Pfirrmann分级系统评估LDD情况。进行Spearman相关性分析和逻辑回归以评估与LDD相关的因素。使用ROC曲线、准确性、敏感性、F1分数、PPV、NPV、校准曲线和DCA等指标评估模型性能。采用SHAP方法解释ML模型。
本研究共纳入861例患者进行分析。在外部验证队列中,XGBoost模型表现最佳,AUC为0.880(95%CI:0.826 - 0.935)。其准确性(0.819)、特异性(0.841)和阳性预测值(PPV = 0.958)优于其他模型。值得注意的是,它还表现出更高的敏感性(0.814)和F1分数(0.880)。SHAP分析进一步表明,年龄、腰大肌指数(PMI)和多裂肌的功能横截面积(fCSA)是LDD的关键预测因素。
本研究利用椎旁肌定量数据和ML算法开发了LDD预测模型,并纳入SHAP分析以增强模型可解释性。XGBoost模型表现出最佳的预测性能,具有指导早期临床预防和治疗的潜力。