基于机器学习的住院患者急性肾损伤可解释预测模型的构建。

Construction of a machine learning-based interpretable prediction model for acute kidney injury in hospitalized patients.

作者信息

Yu Xiang, Wang WanLing, Wu RiLiGe, Gong XinYan, Ji YuWei, Feng Zhe

机构信息

First Medical Center of Chinese PLA General Hospital, Department of Nephrology, First Medical Center of Chinese PLA General Hospital, State Key Laboratory of Kidney Diseases, National Clinical Research Center for Kidney Diseases, Beijing Key Laboratory of Medical Devices and Integrated Traditional Chinese and Western Drug Development for Severe Kidney Diseases，Beijing Key Laboratory of Digital Intelligent TCM for the Preventionand Treatment of Pan-vascular Diseases，Key Disciplines of National Administration of Traditional Chinese Medicine(zyyzdxk-2023310), Beijing, 100853, China.

Medical Innovation Research Division, Chinese PLA General Hospital, Beijing, 100853, China.

出版信息

Sci Rep. 2025 Mar 18;15(1):9313. doi: 10.1038/s41598-025-90459-5.

DOI:10.1038/s41598-025-90459-5

PMID:40102467

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11920398/

Abstract

In this observational study, we used data from 59,936 hospitalized adults to construct a model. For the models constructed with all 53 variables, all five models achieved acceptable performance with the validation cohort, with the extreme gradient boosting (XGBoost) model showing the best predictive efficacy and stability (area under the curve (AUC), 0.9301). For the simpler models constructed with 39 significant variables screened by the random forest recursive feature elimination method, the XGBoost model also had the best performance (AUC, 0.9357). All the models showed significant net returns according to decision analysis curves, and the XGBoost model achieved the optimal results. In addition, the Shapley additive explanation (SHAP) importance matrices revealed that uric acid, colloidal solution, first creatinine value on admission, pulse and albumin represented the top five most important variables for both modeling strategies. With the external validation cohort based on 4022 hospitalized patients, the performance of all models declined, among which the Support vector machine (SVM) model showed the best predictive efficacy (AUC, 0.8230 and 0.8329), followed by the XGBoost model (0.8124 and 0.8316). Thus, our model can predict the occurrence and risk of acute kidney injury (AKI) up to 48 h in advance, enabling clinicians to assess the risk of AKI in hospitalized patients more accurately and intuitively and to develop necessary AKI management strategies.

摘要

在这项观察性研究中，我们使用了来自59936名住院成人的数据来构建一个模型。对于用所有53个变量构建的模型，所有五个模型在验证队列中均表现出可接受的性能，其中极端梯度提升（XGBoost）模型显示出最佳的预测效果和稳定性（曲线下面积（AUC）为0.9301）。对于用随机森林递归特征消除方法筛选出的39个显著变量构建的更简单模型，XGBoost模型也具有最佳性能（AUC为0.9357）。根据决策分析曲线，所有模型均显示出显著的净收益，且XGBoost模型取得了最优结果。此外，夏普利加性解释（SHAP）重要性矩阵显示，尿酸、胶体溶液、入院时首次肌酐值、脉搏和白蛋白是两种建模策略中最重要的五个变量。在基于4022名住院患者的外部验证队列中，所有模型的性能均有所下降，其中支持向量机（SVM）模型显示出最佳的预测效果（AUC分别为0.8230和0.8329），其次是XGBoost模型（0.8124和0.8316）。因此，我们的模型可以提前48小时预测急性肾损伤（AKI）的发生和风险，使临床医生能够更准确、直观地评估住院患者发生AKI的风险，并制定必要的AKI管理策略。