Liew Bernard X W, Kovacs Francisco M, Rügamer David, Royuela Ana
School of Sport, Rehabilitation and Exercise Sciences, University of Essex, Colchester CO4 3SQ, Essex, UK.
Unidad de la Espalda Kovacs, HLA-Moncloa University Hospital, 28008 Madrid, Spain.
J Clin Med. 2023 Sep 27;12(19):6232. doi: 10.3390/jcm12196232.
This study aims to compare the variable selection strategies of different machine learning (ML) and statistical algorithms in the prognosis of neck pain (NP) recovery. A total of 3001 participants with NP were included. Three dichotomous outcomes of an improvement in NP, arm pain (AP), and disability at 3 months follow-up were used. Twenty-five variables (twenty-eight parameters) were included as predictors. There were more parameters than variables, as some categorical variables had >2 levels. Eight modelling techniques were compared: stepwise regression based on unadjusted values (stepP), on adjusted values (stepPAdj), on Akaike information criterion (stepAIC), best subset regression (BestSubset) least absolute shrinkage and selection operator [LASSO], Minimax concave penalty (MCP), model-based boosting (mboost), and multivariate adaptive regression splines (MuARS). The algorithm that selected the fewest predictors was stepPAdj (number of predictors, = 4 to 8). MuARS was the algorithm with the second fewest predictors selected ( = 9 to 14). The predictor selected by all algorithms with the largest coefficient magnitude was "having undergone a neuroreflexotherapy intervention" for NP (β = from 1.987 to 2.296) and AP (β = from 2.639 to 3.554), and "Imaging findings: spinal stenosis" (β = from -1.331 to -1.763) for disability. Stepwise regression based on adjusted -values resulted in the sparsest models, which enhanced clinical interpretability. MuARS appears to provide the optimal balance between model sparsity whilst retaining high predictive performance across outcomes. Different algorithms produced similar performances but resulted in a different number of variables selected. Rather than relying on any single algorithm, confidence in the variable selection may be increased by using multiple algorithms.
本研究旨在比较不同机器学习(ML)和统计算法在颈部疼痛(NP)恢复预后中的变量选择策略。共纳入3001例NP患者。采用NP改善、手臂疼痛(AP)和3个月随访时残疾这三个二分结局。纳入25个变量(28个参数)作为预测因子。由于一些分类变量有>2个水平,所以参数比变量多。比较了八种建模技术:基于未调整值的逐步回归(stepP)、基于调整值的逐步回归(stepPAdj)、基于赤池信息准则的逐步回归(stepAIC)、最佳子集回归(BestSubset)、最小绝对收缩和选择算子[LASSO]、最小最大凹惩罚(MCP)、基于模型的增强(mboost)和多元自适应回归样条(MuARS)。选择预测因子最少的算法是stepPAdj(预测因子数量,=4至8)。MuARS是选择预测因子数量第二少的算法(=9至14)。所有算法中系数绝对值最大的预测因子,对于NP(β=1.987至2.296)和AP(β=2.639至3.554)是“接受过神经反射疗法干预”,对于残疾是“影像学表现:椎管狭窄”(β=-1.331至-1.763)。基于调整值的逐步回归产生的模型最稀疏,增强了临床可解释性。MuARS似乎在模型稀疏性与跨结局保留高预测性能之间提供了最佳平衡。不同算法产生了相似的性能,但选择的变量数量不同。与其依赖任何单一算法,使用多种算法可能会增加对变量选择的信心。