Garcia-Montemayor Victoria, Martin-Malo Alejandro, Barbieri Carlo, Bellocchio Francesco, Soriano Sagrario, Pendon-Ruiz de Mier Victoria, Molina Ignacio R, Aljama Pedro, Rodriguez Mariano
Department of Nephrology, Reina Sofia University Hospital, Cordoba, Spain.
Maimonides Biomedical Research Institute of Cordoba (IMIBIC), Reina Sofia University Hospital, University of Cordoba, Spain.
Clin Kidney J. 2020 Aug 11;14(5):1388-1395. doi: 10.1093/ckj/sfaa126. eCollection 2021 May.
Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients.
Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session.
There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68-0.73] and superior to logistic regression models (ΔAUC 0.007-0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables.
Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.
除了经典的逻辑回归分析外,目前还使用基于随机森林等机器学习技术的非参数方法来生成预测模型。本研究的目的是评估血液透析患者的随机森林死亡率预测模型。
数据来源于1995年至2015年期间开始进行血液透析的患者。使用随机森林计算血液透析6个月、1年和2年时的死亡率预测,并将准确性与逻辑回归进行比较。基线数据由定期血液透析初期获得的信息构建而成。为了提高关于每位患者基线信息的准确性,收集数据的时间段设定为首次血液透析治疗后的30天、60天和90天。
共纳入1571例开始进行血液透析的患者。平均年龄为62.3岁,平均查尔森合并症指数为5.99。随机森林获得的死亡率预测模型在准确性方面(曲线下面积[AUC]为0.68 - 0.73)似乎是合适的,并且优于逻辑回归模型(AUC差值为0.007 - 0.046)。结果表明,随机森林和逻辑回归使用不同变量构建死亡率预测模型。
随机森林是一种适用于生成血液透析患者死亡率预测模型的方法,且优于逻辑回归。