Huang Junzhang, Liu Wencai
Department of General Surgery, Lianjiang Traditional Chinese Medicine Hospital, Zhanjiang, Guangdong, China.
Department of Orthopedics, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Medicine (Baltimore). 2025 May 30;104(22):e42690. doi: 10.1097/MD.0000000000042690.
The aim of this study was to compare the performance of 4 machine learning models-Lasso regression model, random forest model, Boruta algorithm model, and the Boruta algorithm combined with Lasso regression-in predicting stroke risk among hypertensive patients. The study evaluated the strengths and weaknesses of each model to provide a more clinically valuable prediction model for stroke risk. The study included 3472 hypertensive patients, of which 312 had experienced a stroke, and 3160 had not. Various health indicators were analyzed using Lasso regression, random forest, Boruta algorithm, and the Boruta algorithm combined with Lasso regression. Model performance was evaluated based on the area under the curve (AUC) of the receiver operating characteristic curve, the precision-recall curve, calibration curve, and decision curve analysis to assess classification ability, precision, calibration, and clinical benefit. The Lasso regression and Boruta algorithm models both have an AUC of 0.716, making them the best-performing models in terms of classification ability. The Boruta algorithm combined with Lasso regression model has an AUC of 0.705, slightly lower than the previous 2 models but still shows good predictive capability, with better interpretability due to feature selection. The random forest model has an AUC of 0.626, which is the lowest among the models, indicating weaker classification performance compared to the others. Among the 4 models, the Lasso regression model and Boruta algorithm model performed similarly in terms of classification ability, both demonstrating moderate predictive power, while the random forest model performed relatively poorly. The Boruta combined with Lasso regression model was precise in variable selection but had limited clinical utility. Therefore, the Lasso regression model appears to be the most balanced in predicting stroke risk and is the recommended model based on this study.
本研究的目的是比较4种机器学习模型(套索回归模型、随机森林模型、Boruta算法模型以及结合套索回归的Boruta算法)在预测高血压患者中风风险方面的表现。该研究评估了每种模型的优缺点,以提供一个对中风风险更具临床价值的预测模型。该研究纳入了3472名高血压患者,其中312人曾发生中风,3160人未发生中风。使用套索回归、随机森林、Boruta算法以及结合套索回归的Boruta算法对各种健康指标进行了分析。基于受试者工作特征曲线的曲线下面积(AUC)、精确召回曲线、校准曲线和决策曲线分析对模型性能进行评估,以评估分类能力、精确性、校准和临床益处。套索回归模型和Boruta算法模型的AUC均为0.716,就分类能力而言,它们是表现最佳的模型。结合套索回归的Boruta算法模型的AUC为0.705,略低于前两个模型,但仍显示出良好的预测能力,由于进行了特征选择,其可解释性更强。随机森林模型的AUC为0.626,是所有模型中最低的,表明其分类性能比其他模型弱。在这4种模型中,套索回归模型和Boruta算法模型在分类能力方面表现相似,均显示出中等预测能力,而随机森林模型表现相对较差。结合套索回归的Boruta算法模型在变量选择方面很精确,但临床实用性有限。因此,套索回归模型在预测中风风险方面似乎是最平衡的,基于本研究,它是推荐模型。