Zhang Yunlu, Wang Yimei, Xu Jiarui, Zhu Bowen, Chen Xiaohong, Ding Xiaoqiang, Li Yang
Department of Nephrology, Zhongshan Hospital, Fudan University, Shanghai, People's Republic of China.
Shanghai Medical Center of Kidney, Shanghai, People's Republic of China.
Int J Gen Med. 2021 Apr 16;14:1325-1335. doi: 10.2147/IJGM.S302795. eCollection 2021.
Based on the admission data, we applied the XGBoost algorithm to create a prediction model to estimate the AKI risk in patients with hepatobiliary malignancies and then compare its prediction capacity with the logistic model.
We reviewed clinical data of 7968 and 589 liver/gallbladder cancer patients admitted to Zhongshan Hospital during 2014 and 2015. They were randomly divided into the training set and test set. Data were collected from the electronic medical record system. XGBoost and LASSO-logistic were used to develop prediction models, respectively. The performance measures included the classification matrix, the area under the receiver operating characteristic curve (AUC), lift chart and learning curve.
Of 6846 participants in the training set, 792 (11.6%) cases developed AKI. In XGBoost model, the top 3 most important variables for AKI were serum creatinine (SCr), glomerular filtration rate (eGFR) and antitumor treatment in liver cancer patients. Similarly, SCr and eGFR also ranked second and third most important variables in the gallbladder cancer-related AKI model just after phosphorus. In the classification matrix, XGBoost model possessed a comparably better agreement between the actual observations and the predictions than LASSO-logistic model. The Youden's index of XGBoost model was 47.5% and 59.3%, respectively, which was significantly higher than that of LASSO-logistic model (41.6% and 32.7%). The AUCs of XGBoost model were 0.822 in liver cancer and 0.850 in gallbladder cancer. By comparison, the AUC values of Logistic models were significantly lower as 0.793 and 0.740 (p=0.024 and 0.018). With the accumulation of training samples, XGBoost model maintained greater robustness in the learning curve.
XGBoost model based on admission data has higher accuracy and stronger robustness in predicting AKI. It will benefit AKI risk classification management in clinical practice and take an advanced intervention among patients with hepatobiliary malignancies.
基于入院数据,我们应用XGBoost算法创建了一个预测模型,以估计肝胆恶性肿瘤患者发生急性肾损伤(AKI)的风险,然后将其预测能力与逻辑模型进行比较。
我们回顾了2014年和2015年期间入住中山医院的7968例肝癌/胆囊癌患者和589例患者的临床资料。他们被随机分为训练集和测试集。数据从电子病历系统中收集。分别使用XGBoost和LASSO逻辑回归来开发预测模型。性能指标包括分类矩阵、受试者操作特征曲线下面积(AUC)、提升图和学习曲线。
在训练集的6846名参与者中,792例(11.6%)发生了AKI。在XGBoost模型中,肝癌患者发生AKI的前3个最重要变量是血清肌酐(SCr)、肾小球滤过率(eGFR)和抗肿瘤治疗。同样,在胆囊癌相关的AKI模型中,SCr和eGFR在仅次于磷之后也分别排名第二和第三重要变量。在分类矩阵中,XGBoost模型在实际观察结果与预测结果之间的一致性比LASSO逻辑回归模型更好。XGBoost模型的约登指数分别为47.5%和59.3%,显著高于LASSO逻辑回归模型(41.6%和32.7%)。XGBoost模型在肝癌中的AUC为0.822,在胆囊癌中的AUC为0.850。相比之下,逻辑回归模型的AUC值显著较低,分别为0.793和0.740(p=0.024和0.018)。随着训练样本的积累,XGBoost模型在学习曲线中保持了更强的稳健性。
基于入院数据的XGBoost模型在预测AKI方面具有更高的准确性和更强的稳健性。它将有利于临床实践中AKI风险的分类管理,并对肝胆恶性肿瘤患者进行早期干预。