Shu Peng, Wang Xia, Wen Zhuping, Chen Jie, Xu Fang
The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
Front Med (Lausanne). 2025 Jul 7;12:1615950. doi: 10.3389/fmed.2025.1615950. eCollection 2025.
Patients undergoing maintenance hemodialysis face a high mortality rate, yet effective tools for predicting mortality risk in this population are lacking. This study aims to develop an interpretable machine learning model to predict mortality risk among maintenance hemodialysis patients.
A retrospective analysis was conducted on clinical data from 512 maintenance hemodialysis patients treated at The Central Hospital of Wuhan between January 2021 and October 2024. The dataset included 50 feature variables. The data were split into a training set (70%) and a test set (30%). Five machine learning models-Random Forest, Extreme Gradient Boosting, Support Vector Machine, Logistic Regression, and K-Nearest Neighbor-were trained and evaluated for predicting patient mortality risk, using metrics such as the F1 score, precision, accuracy, AUC-ROC, and recall. SHAP values were used to assess the contribution of each feature in the best-performing model.
The K-Nearest Neighbor model achieved the highest AUC-ROC of 0.9792 (95% CI: 0.9600-0.9929). SHAP analysis identified key factors influencing predictions, including dialysis duration, creatinine levels, white blood cell ratio, blood phosphorus concentration, and unconjugated iron.
The K-Nearest Neighbor model demonstrated high efficacy in predicting mortality risk among hemodialysis patients. SHAP analysis highlighted critical risk factors. While these findings show promise for future clinical research, they should be interpreted with caution due to the study's retrospective design and the need for external validation.
接受维持性血液透析的患者面临着较高的死亡率,但目前缺乏有效的工具来预测该人群的死亡风险。本研究旨在开发一种可解释的机器学习模型,以预测维持性血液透析患者的死亡风险。
对2021年1月至2024年10月在武汉市中心医院接受治疗的512例维持性血液透析患者的临床数据进行回顾性分析。数据集包括50个特征变量。数据被分为训练集(70%)和测试集(30%)。使用F1分数、精确率、准确率、AUC-ROC和召回率等指标,对随机森林、极端梯度提升、支持向量机、逻辑回归和K近邻这五种机器学习模型进行训练和评估,以预测患者的死亡风险。使用SHAP值评估最佳表现模型中每个特征的贡献。
K近邻模型的AUC-ROC最高,为0.9792(95%CI:0.9600-0.9929)。SHAP分析确定了影响预测的关键因素,包括透析时间、肌酐水平、白细胞比例、血磷浓度和非结合铁。
K近邻模型在预测血液透析患者的死亡风险方面显示出高效性。SHAP分析突出了关键风险因素。虽然这些发现为未来的临床研究带来了希望,但由于本研究的回顾性设计以及需要外部验证,因此应谨慎解读。