Department of Spine Surgery, The Sixth Affiliated Hospital of Xinjiang Medical University, Urumqi, 830000, Xinjiang, People's Republic of China.
Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China.
Eur J Med Res. 2024 Jul 25;29(1):383. doi: 10.1186/s40001-024-01988-0.
Tuberculosis spondylitis (TS), commonly known as Pott's disease, is a severe type of skeletal tuberculosis that typically requires surgical treatment. However, this treatment option has led to an increase in healthcare costs due to prolonged hospital stays (PLOS). Therefore, identifying risk factors associated with extended PLOS is necessary. In this research, we intended to develop an interpretable machine learning model that could predict extended PLOS, which can provide valuable insights for treatments and a web-based application was implemented.
We obtained patient data from the spine surgery department at our hospital. Extended postoperative length of stay (PLOS) refers to a hospitalization duration equal to or exceeding the 75th percentile following spine surgery. To identify relevant variables, we employed several approaches, such as the least absolute shrinkage and selection operator (LASSO), recursive feature elimination (RFE) based on support vector machine classification (SVC), correlation analysis, and permutation importance value. Several models using implemented and some of them are ensembled using soft voting techniques. Models were constructed using grid search with nested cross-validation. The performance of each algorithm was assessed through various metrics, including the AUC value (area under the curve of receiver operating characteristics) and the Brier Score. Model interpretation involved utilizing methods such as Shapley additive explanations (SHAP), the Gini Impurity Index, permutation importance, and local interpretable model-agnostic explanations (LIME). Furthermore, to facilitate the practical application of the model, a web-based interface was developed and deployed.
The study included a cohort of 580 patients and 11 features include (CRP, transfusions, infusion volume, blood loss, X-ray bone bridge, X-ray osteophyte, CT-vertebral destruction, CT-paravertebral abscess, MRI-paravertebral abscess, MRI-epidural abscess, postoperative drainage) were selected. Most of the classifiers showed better performance, where the XGBoost model has a higher AUC value (0.86) and lower Brier Score (0.126). The XGBoost model was chosen as the optimal model. The results obtained from the calibration and decision curve analysis (DCA) plots demonstrate that XGBoost has achieved promising performance. After conducting tenfold cross-validation, the XGBoost model demonstrated a mean AUC of 0.85 ± 0.09. SHAP and LIME were used to display the variables' contributions to the predicted value. The stacked bar plots indicated that infusion volume was the primary contributor, as determined by Gini, permutation importance (PFI), and the LIME algorithm.
Our methods not only effectively predicted extended PLOS but also identified risk factors that can be utilized for future treatments. The XGBoost model developed in this study is easily accessible through the deployed web application and can aid in clinical research.
结核病性脊柱炎(TS),通常被称为波特病,是一种严重的骨结核类型,通常需要手术治疗。然而,由于住院时间延长(PLOS),这种治疗选择导致医疗费用增加。因此,确定与延长 PLOS 相关的风险因素是必要的。在这项研究中,我们旨在开发一种可解释的机器学习模型,以预测延长的 PLOS,这可以为治疗提供有价值的见解,并实现了一个基于网络的应用程序。
我们从我院脊柱外科部门获取患者数据。延长的术后住院时间(PLOS)是指脊柱手术后住院时间等于或超过第 75 个百分位数。为了确定相关变量,我们采用了几种方法,例如最小绝对收缩和选择算子(LASSO)、基于支持向量机分类(SVC)的递归特征消除(RFE)、相关分析和置换重要值。使用了几种已实现的模型,其中一些使用软投票技术进行了集成。使用网格搜索和嵌套交叉验证构建模型。通过各种指标评估每个算法的性能,包括 AUC 值(接收者操作特征曲线下的面积)和 Brier 评分。模型解释涉及使用 Shapley 加性解释(SHAP)、基尼杂质指数、置换重要性和局部可解释模型无关解释(LIME)等方法。此外,为了便于模型的实际应用,开发并部署了一个基于网络的界面。
该研究包括 580 名患者的队列和 11 个特征,包括(CRP、输血、输液量、出血量、X 射线骨桥、X 射线骨赘、CT-椎体破坏、CT-椎旁脓肿、MRI-椎旁脓肿、MRI-硬膜外脓肿、术后引流)。大多数分类器表现出更好的性能,其中 XGBoost 模型具有更高的 AUC 值(0.86)和更低的 Brier 得分(0.126)。XGBoost 模型被选为最优模型。校准和决策曲线分析(DCA)图的结果表明,XGBoost 表现出了有希望的性能。经过十折交叉验证,XGBoost 模型的平均 AUC 为 0.85±0.09。SHAP 和 LIME 用于显示变量对预测值的贡献。堆叠条形图表明,根据 Gini、置换重要性(PFI)和 LIME 算法,输液量是主要贡献者。
我们的方法不仅有效地预测了延长的 PLOS,还确定了可用于未来治疗的风险因素。本研究中开发的 XGBoost 模型可通过部署的网络应用程序轻松访问,并可用于临床研究。