1Department of Neurosurgery, Duke University, Durham, North Carolina.
2Department of Neurosurgery, University of California, San Francisco, California.
Neurosurg Focus. 2023 Jun;54(6):E5. doi: 10.3171/2023.3.FOCUS2372.
The purpose of this study was to evaluate the performance of different supervised machine learning algorithms to predict achievement of minimum clinically important difference (MCID) in neck pain after surgery in patients with cervical spondylotic myelopathy (CSM).
This was a retrospective analysis of the prospective Quality Outcomes Database CSM cohort. The data set was divided into an 80% training and a 20% test set. Various supervised learning algorithms (including logistic regression, support vector machine, decision tree, random forest, extra trees, gaussian naïve Bayes, k-nearest neighbors, multilayer perceptron, and extreme gradient boosted trees) were evaluated on their performance to predict achievement of MCID in neck pain at 3 and 24 months after surgery, given a set of predicting baseline features. Model performance was assessed with accuracy, F1 score, area under the receiver operating characteristic curve, precision, recall/sensitivity, and specificity.
In total, 535 patients (46.9%) achieved MCID for neck pain at 3 months and 569 patients (49.9%) achieved it at 24 months. In each follow-up cohort, 501 patients (93.6%) were satisfied at 3 months after surgery and 569 patients (100%) were satisfied at 24 months after surgery. Of the supervised machine learning algorithms tested, logistic regression demonstrated the best accuracy (3 months: 0.76 ± 0.031, 24 months: 0.773 ± 0.044), followed by F1 score (3 months: 0.759 ± 0.019, 24 months: 0.777 ± 0.039) and area under the receiver operating characteristic curve (3 months: 0.762 ± 0.027, 24 months: 0.773 ± 0.043) at predicting achievement of MCID for neck pain at both follow-up time points, with fair performance. The best precision was also demonstrated by logistic regression at 3 (0.724 ± 0.058) and 24 (0.780 ± 0.097) months. The best recall/sensitivity was demonstrated by multilayer perceptron at 3 months (0.841 ± 0.094) and by extra trees at 24 months (0.817 ± 0.115). Highest specificity was shown by support vector machine at 3 months (0.952 ± 0.013) and by logistic regression at 24 months (0.747 ± 0.18).
Appropriate selection of models for studies should be based on the strengths of each model and the aims of the studies. For maximally predicting true achievement of MCID in neck pain, of all the predictions in this balanced data set the appropriate metric for the authors' study was precision. For both short- and long-term follow-ups, logistic regression demonstrated the highest precision of all models tested. Logistic regression performed consistently the best of all models tested and remains a powerful model for clinical classification tasks.
本研究旨在评估不同监督机器学习算法在预测颈椎脊髓病(CSM)患者手术后颈部疼痛达到最小临床重要差异(MCID)方面的性能。
这是对前瞻性质量结果数据库 CSM 队列的回顾性分析。数据集分为 80%的训练集和 20%的测试集。各种监督学习算法(包括逻辑回归、支持向量机、决策树、随机森林、极端随机树、高斯朴素贝叶斯、k 最近邻、多层感知机和极端梯度提升树)在给定一组预测基线特征的情况下,对其预测术后 3 个月和 24 个月时颈部疼痛达到 MCID 的性能进行评估。模型性能通过准确性、F1 得分、受试者工作特征曲线下面积、精确性、召回率/敏感性和特异性进行评估。
共有 535 名患者(46.9%)在术后 3 个月时达到了颈部疼痛的 MCID,569 名患者(49.9%)在术后 24 个月时达到了 MCID。在每个随访队列中,501 名患者(93.6%)在手术后 3 个月时满意,569 名患者(100%)在手术后 24 个月时满意。在测试的监督机器学习算法中,逻辑回归显示出最佳的准确性(3 个月:0.76±0.031,24 个月:0.773±0.044),其次是 F1 得分(3 个月:0.759±0.019,24 个月:0.777±0.039)和受试者工作特征曲线下面积(3 个月:0.762±0.027,24 个月:0.773±0.043),在预测术后 3 个月和 24 个月时颈部疼痛达到 MCID 方面表现出良好的性能。逻辑回归在 3 个月(0.724±0.058)和 24 个月(0.780±0.097)时也表现出最佳的精确性。多层感知机在 3 个月时表现出最佳的召回率/敏感性(0.841±0.094),而极端随机树在 24 个月时表现出最佳的召回率/敏感性(0.817±0.115)。支持向量机在 3 个月时表现出最高的特异性(0.952±0.013),逻辑回归在 24 个月时表现出最高的特异性(0.747±0.18)。
对于研究,应根据每个模型的优势和研究目的选择适当的模型。为了最大限度地预测颈部疼痛达到 MCID 的真实情况,在这个平衡数据集的所有预测中,作者研究的适当指标是精确性。对于短期和长期随访,逻辑回归在所有测试模型中表现出最高的精确性。逻辑回归在所有测试模型中表现一致最佳,仍然是临床分类任务的有力模型。