文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

使用机器学习算法对不平衡数据进行分类,以预测埃塞俄比亚肾移植失败的风险。

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia.

机构信息

Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia.

School of Mathematics, Statistics, and Computer Science, KwaZulu-Natal University, Durban, South Africa.

出版信息

BMC Med Inform Decis Mak. 2023 May 22;23(1):98. doi: 10.1186/s12911-023-02185-5.


DOI:10.1186/s12911-023-02185-5
PMID:37217892
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10201495/
Abstract

INTRODUCTION: The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models. METHODOLOGY: The data was extracted from the retrospective cohort of kidney transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. In response to the imbalanced nature of the data, we performed hyperparameter tuning, probability threshold moving, tree-based ensemble learning, stacking ensemble learning, and probability calibrations to improve the prediction results. Merit-based selected probabilistic (logistic regression, naive Bayes, and artificial neural network) and tree-based ensemble (random forest, bagged tree, and stochastic gradient boosting) models were applied. Model comparison was performed in terms of discrimination and calibration performance. The best-performing model was then used to predict the risk of graft failure. RESULTS: A total of 278 completed cases were analyzed, with 21 graft failures and 3 events per predictor. Of these, 74.8% are male, and 25.2% are female, with a median age of 37. From the comparison of models at the individual level, the bagged tree and random forest have top and equal discrimination performance (AUC-ROC = 0.84). In contrast, the random forest has the best calibration performance (brier score = 0.045). Under testing the individual model as a meta-learner for stacking ensemble learning, the result of stochastic gradient boosting as a meta-learner has the top discrimination (AUC-ROC = 0.88) and calibration (brier score = 0.048) performance. Regarding feature importance, chronic rejection, blood urea nitrogen, number of post-transplant admissions, phosphorus level, acute rejection, and urological complications are the top predictors of graft failure. CONCLUSIONS: Bagging, boosting, and stacking, with probability calibration, are good choices for clinical risk predictions working on imbalanced data. The data-driven probability threshold is more beneficial than the natural threshold of 0.5 to improve the prediction result from imbalanced data. Integrating various techniques in a systematic framework is a smart strategy to improve prediction results from imbalanced data. It is recommended for clinical experts in kidney transplantation to use the final calibrated model as a decision support system to predict the risk of graft failure for individual patients.

摘要

简介: 近几十年来,终末期肾病的患病率不断上升,对肾脏替代疗法的需求也不断增加。尽管肾移植比透析更能提高生活质量并降低治疗费用,但移植后仍有可能发生移植物衰竭。因此,本研究旨在使用选定的机器学习预测模型预测埃塞俄比亚肾移植受者移植后发生移植物衰竭的风险。

方法: 从 2015 年 9 月至 2022 年 2 月,从埃塞俄比亚国家肾脏移植中心的肾移植受者回顾性队列中提取数据。针对数据不平衡的问题,我们进行了超参数调整、概率阈值移动、基于树的集成学习、堆叠集成学习和概率校准,以提高预测结果。应用基于优点选择的概率模型(逻辑回归、朴素贝叶斯和人工神经网络)和基于树的集成模型(随机森林、袋装树和随机梯度提升)。通过判别和校准性能来比较模型。然后,使用表现最佳的模型来预测移植物衰竭的风险。

结果: 共分析了 278 例完成病例,其中 21 例移植物失败,每个预测因子 3 个事件。其中,74.8%为男性,25.2%为女性,中位年龄为 37 岁。在个体模型比较中,袋装树和随机森林具有最高和相同的判别性能(AUC-ROC=0.84)。相比之下,随机森林具有最佳的校准性能(Brier 得分=0.045)。在作为堆叠集成学习的元学习者测试个体模型时,随机梯度提升作为元学习者的判别性能(AUC-ROC=0.88)和校准性能(Brier 得分=0.048)最高。关于特征重要性,慢性排斥反应、血尿素氮、移植后住院次数、磷水平、急性排斥反应和尿路上皮并发症是移植物衰竭的主要预测因素。

结论: 对于在不平衡数据上进行临床风险预测,装袋、提升和堆叠以及概率校准是不错的选择。与自然阈值 0.5 相比,数据驱动的概率阈值更有利于提高不平衡数据的预测结果。在系统框架中整合各种技术是提高不平衡数据预测结果的明智策略。建议肾脏移植临床专家使用最终校准模型作为决策支持系统,预测个体患者移植物衰竭的风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/95d94ab79850/12911_2023_2185_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/a6b2f2bc6d62/12911_2023_2185_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/8f5942aae0d1/12911_2023_2185_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/c10648ccd6de/12911_2023_2185_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/c2507951c9c8/12911_2023_2185_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/1df14328618c/12911_2023_2185_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/4ac1e9ee916c/12911_2023_2185_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/0272073ec197/12911_2023_2185_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/2c2e7edb3f9f/12911_2023_2185_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/95d94ab79850/12911_2023_2185_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/a6b2f2bc6d62/12911_2023_2185_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/8f5942aae0d1/12911_2023_2185_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/c10648ccd6de/12911_2023_2185_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/c2507951c9c8/12911_2023_2185_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/1df14328618c/12911_2023_2185_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/4ac1e9ee916c/12911_2023_2185_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/0272073ec197/12911_2023_2185_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/2c2e7edb3f9f/12911_2023_2185_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/95d94ab79850/12911_2023_2185_Fig9_HTML.jpg

相似文献

[1]
Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia.

BMC Med Inform Decis Mak. 2023-5-22

[2]
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?

Clin Orthop Relat Res. 2020-7

[3]
Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.

BMC Public Health. 2024-6-28

[4]
A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system.

Math Biosci Eng. 2022-4-13

[5]
Comparing ensemble learning algorithms and severity of illness scoring systems in cardiac intensive care units: a retrospective study.

Einstein (Sao Paulo). 2024

[6]
Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.

Comput Intell Neurosci. 2023

[7]
Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms.

BMC Med Res Methodol. 2023-4-22

[8]
Can Machine-learning Algorithms Predict Early Revision TKA in the Danish Knee Arthroplasty Registry?

Clin Orthop Relat Res. 2020-9

[9]
Artificial intelligence for predicting survival following deceased donor liver transplantation: Retrospective multi-center study.

Int J Surg. 2022-9

[10]
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.

Med Phys. 2018-6-13

引用本文的文献

[1]
Exploring machine learning classification for community based health insurance enrollment in Ethiopia.

Front Public Health. 2025-7-18

[2]
A noninvasive model for chronic kidney disease screening and common pathological type identification from retinal images.

Nat Commun. 2025-7-29

[3]
Predictive Model for In-Hospital Death in Older Patients with Type 2 Diabetes Mellitus: A Multicenter Retrospective Study in Southwest China.

Diabetes Metab Syndr Obes. 2025-6-9

[4]
Heuristic optimization in classification atoms in molecules using GCN via uniform simulated annealing.

Sci Rep. 2025-5-20

[5]
Developing clinical prognostic models to predict graft survival after renal transplantation: comparison of statistical and machine learning models.

BMC Med Inform Decis Mak. 2025-2-3

[6]
Comparative study of ten machine learning algorithms for short-term forecasting in gas warning systems.

Sci Rep. 2024-9-20

[7]
A machine learning approach towards assessing consistency and reproducibility: an application to graft survival across three kidney transplantation eras.

Front Digit Health. 2024-9-3

[8]
The transformative potential of artificial intelligence in solid organ transplantation.

Front Transplant. 2024-3-15

[9]
The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case-control study.

BMC Infect Dis. 2024-4-18

[10]
A cost-sensitive deep neural network-based prediction model for the mortality in acute myocardial infarction patients with hypertension on imbalanced data.

Front Cardiovasc Med. 2024-3-19

本文引用的文献

[1]
The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression.

J Am Med Inform Assoc. 2022-8-16

[2]
Global Perspective on Kidney Transplantation: United States.

Kidney360. 2021-8-19

[3]
Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods.

BMC Med Inform Decis Mak. 2022-2-10

[4]
Feasibility of Machine Learning and Logistic Regression Algorithms to Predict Outcome in Orthopaedic Trauma Surgery.

J Bone Joint Surg Am. 2022-3-16

[5]
Long-term outcomes after kidney transplant failure and variables related to risk of death and probability of retransplant: Results from a single-center cohort study in Brazil.

PLoS One. 2021

[6]
High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms.

Sci Total Environ. 2020-4-13

[7]
Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest, REPTree: A case study at the Gumani River Basin, India.

Sci Total Environ. 2020-5-4

[8]
Study of cardiovascular disease prediction model based on random forest in eastern China.

Sci Rep. 2020-3-23

[9]
Prevalence of Chronic Kidney Disease and Associated Factors among Patients with Diabetes in Northwest Ethiopia: A Hospital-Based Cross-Sectional Study.

Curr Ther Res Clin Exp. 2020-2-26

[10]
A tutorial on calibration measurements and calibration models for clinical prediction models.

J Am Med Inform Assoc. 2020-4-1

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索