Petousis Panayiotis, Wilson James M, Gelvezon Alex V, Alam Shafiul, Jain Ankur, Prichard Laura, Elashoff David A, Raja Naveen, Bui Alex A T
UCLA Health Clinical and Translational Science Institute, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA 90024-2943, United States.
Department of Medicine, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, CA 90024-2943, United States.
JAMIA Open. 2024 Feb 27;7(1):ooae015. doi: 10.1093/jamiaopen/ooae015. eCollection 2024 Apr.
In the United States, end-stage kidney disease (ESKD) is responsible for high mortality and significant healthcare costs, with the number of cases sharply increasing in the past 2 decades. In this study, we aimed to reduce these impacts by developing an ESKD model for predicting its occurrence in a 2-year period.
We developed a machine learning (ML) pipeline to test different models for the prediction of ESKD. The electronic health record was used to capture several kidney disease-related variables. Various imputation methods, feature selection, and sampling approaches were tested. We compared the performance of multiple ML models using area under the ROC curve (AUCROC), area under the Precision-Recall curve (PR-AUC), and Brier scores for discrimination, precision, and calibration, respectively. Explainability methods were applied to the final model.
Our best model was a gradient-boosting machine with feature selection and imputation methods as additional components. The model exhibited an AUCROC of 0.97, a PR-AUC of 0.33, and a Brier score of 0.002 on a holdout test set. A chart review analysis by expert physicians indicated clinical utility.
An ESKD prediction model can identify individuals at risk for ESKD and has been successfully deployed within our health system.
在美国,终末期肾病(ESKD)导致了高死亡率和高昂的医疗费用,且在过去20年中病例数急剧增加。在本研究中,我们旨在通过开发一个ESKD模型来预测其在2年内的发生情况,从而减少这些影响。
我们开发了一个机器学习(ML)流程,以测试用于预测ESKD的不同模型。利用电子健康记录来获取多个与肾脏疾病相关的变量。测试了各种插补方法、特征选择和抽样方法。我们分别使用ROC曲线下面积(AUCROC)、精确召回率曲线下面积(PR-AUC)以及用于区分、精确率和校准的Brier分数,比较了多个ML模型的性能。将可解释性方法应用于最终模型。
我们的最佳模型是一个梯度提升机,其附加组件包括特征选择和插补方法。在一个保留测试集上,该模型的AUCROC为0.97,PR-AUC为0.33,Brier分数为0.002。专家医生进行的图表审查分析表明该模型具有临床实用性。
一个ESKD预测模型能够识别有ESKD风险的个体,并且已在我们的医疗系统中成功部署。