使用机器学习算法对不平衡数据进行分类，以预测埃塞俄比亚肾移植失败的风险。

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia.

机构信息

Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia.

School of Mathematics, Statistics, and Computer Science, KwaZulu-Natal University, Durban, South Africa.

出版信息

BMC Med Inform Decis Mak. 2023 May 22;23(1):98. doi: 10.1186/s12911-023-02185-5.

DOI:10.1186/s12911-023-02185-5

PMID:37217892

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10201495/

Abstract

INTRODUCTION

The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models.

METHODOLOGY

The data was extracted from the retrospective cohort of kidney transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. In response to the imbalanced nature of the data, we performed hyperparameter tuning, probability threshold moving, tree-based ensemble learning, stacking ensemble learning, and probability calibrations to improve the prediction results. Merit-based selected probabilistic (logistic regression, naive Bayes, and artificial neural network) and tree-based ensemble (random forest, bagged tree, and stochastic gradient boosting) models were applied. Model comparison was performed in terms of discrimination and calibration performance. The best-performing model was then used to predict the risk of graft failure.

RESULTS

A total of 278 completed cases were analyzed, with 21 graft failures and 3 events per predictor. Of these, 74.8% are male, and 25.2% are female, with a median age of 37. From the comparison of models at the individual level, the bagged tree and random forest have top and equal discrimination performance (AUC-ROC = 0.84). In contrast, the random forest has the best calibration performance (brier score = 0.045). Under testing the individual model as a meta-learner for stacking ensemble learning, the result of stochastic gradient boosting as a meta-learner has the top discrimination (AUC-ROC = 0.88) and calibration (brier score = 0.048) performance. Regarding feature importance, chronic rejection, blood urea nitrogen, number of post-transplant admissions, phosphorus level, acute rejection, and urological complications are the top predictors of graft failure.

CONCLUSIONS

Bagging, boosting, and stacking, with probability calibration, are good choices for clinical risk predictions working on imbalanced data. The data-driven probability threshold is more beneficial than the natural threshold of 0.5 to improve the prediction result from imbalanced data. Integrating various techniques in a systematic framework is a smart strategy to improve prediction results from imbalanced data. It is recommended for clinical experts in kidney transplantation to use the final calibrated model as a decision support system to predict the risk of graft failure for individual patients.

摘要

简介

近几十年来，终末期肾病的患病率不断上升，对肾脏替代疗法的需求也不断增加。尽管肾移植比透析更能提高生活质量并降低治疗费用，但移植后仍有可能发生移植物衰竭。因此，本研究旨在使用选定的机器学习预测模型预测埃塞俄比亚肾移植受者移植后发生移植物衰竭的风险。

方法

从 2015 年 9 月至 2022 年 2 月，从埃塞俄比亚国家肾脏移植中心的肾移植受者回顾性队列中提取数据。针对数据不平衡的问题，我们进行了超参数调整、概率阈值移动、基于树的集成学习、堆叠集成学习和概率校准，以提高预测结果。应用基于优点选择的概率模型（逻辑回归、朴素贝叶斯和人工神经网络）和基于树的集成模型（随机森林、袋装树和随机梯度提升）。通过判别和校准性能来比较模型。然后，使用表现最佳的模型来预测移植物衰竭的风险。

结果

共分析了 278 例完成病例，其中 21 例移植物失败，每个预测因子 3 个事件。其中，74.8%为男性，25.2%为女性，中位年龄为 37 岁。在个体模型比较中，袋装树和随机森林具有最高和相同的判别性能（AUC-ROC=0.84）。相比之下，随机森林具有最佳的校准性能（Brier 得分=0.045）。在作为堆叠集成学习的元学习者测试个体模型时，随机梯度提升作为元学习者的判别性能（AUC-ROC=0.88）和校准性能（Brier 得分=0.048）最高。关于特征重要性，慢性排斥反应、血尿素氮、移植后住院次数、磷水平、急性排斥反应和尿路上皮并发症是移植物衰竭的主要预测因素。

结论

对于在不平衡数据上进行临床风险预测，装袋、提升和堆叠以及概率校准是不错的选择。与自然阈值 0.5 相比，数据驱动的概率阈值更有利于提高不平衡数据的预测结果。在系统框架中整合各种技术是提高不平衡数据预测结果的明智策略。建议肾脏移植临床专家使用最终校准模型作为决策支持系统，预测个体患者移植物衰竭的风险。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用机器学习算法对不平衡数据进行分类，以预测埃塞俄比亚肾移植失败的风险。

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia.

机构信息

出版信息

INTRODUCTION

METHODOLOGY

RESULTS

CONCLUSIONS

简介

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

使用机器学习算法对不平衡数据进行分类，以预测埃塞俄比亚肾移植失败的风险。

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia.

机构信息

出版信息

INTRODUCTION

METHODOLOGY

RESULTS

CONCLUSIONS

简介

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献