• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习算法对不平衡数据进行分类,以预测埃塞俄比亚肾移植失败的风险。

Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia.

机构信息

Department of Statistics, Bahir Dar University, Bahir Dar, Ethiopia.

School of Mathematics, Statistics, and Computer Science, KwaZulu-Natal University, Durban, South Africa.

出版信息

BMC Med Inform Decis Mak. 2023 May 22;23(1):98. doi: 10.1186/s12911-023-02185-5.

DOI:10.1186/s12911-023-02185-5
PMID:37217892
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10201495/
Abstract

INTRODUCTION

The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models.

METHODOLOGY

The data was extracted from the retrospective cohort of kidney transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. In response to the imbalanced nature of the data, we performed hyperparameter tuning, probability threshold moving, tree-based ensemble learning, stacking ensemble learning, and probability calibrations to improve the prediction results. Merit-based selected probabilistic (logistic regression, naive Bayes, and artificial neural network) and tree-based ensemble (random forest, bagged tree, and stochastic gradient boosting) models were applied. Model comparison was performed in terms of discrimination and calibration performance. The best-performing model was then used to predict the risk of graft failure.

RESULTS

A total of 278 completed cases were analyzed, with 21 graft failures and 3 events per predictor. Of these, 74.8% are male, and 25.2% are female, with a median age of 37. From the comparison of models at the individual level, the bagged tree and random forest have top and equal discrimination performance (AUC-ROC = 0.84). In contrast, the random forest has the best calibration performance (brier score = 0.045). Under testing the individual model as a meta-learner for stacking ensemble learning, the result of stochastic gradient boosting as a meta-learner has the top discrimination (AUC-ROC = 0.88) and calibration (brier score = 0.048) performance. Regarding feature importance, chronic rejection, blood urea nitrogen, number of post-transplant admissions, phosphorus level, acute rejection, and urological complications are the top predictors of graft failure.

CONCLUSIONS

Bagging, boosting, and stacking, with probability calibration, are good choices for clinical risk predictions working on imbalanced data. The data-driven probability threshold is more beneficial than the natural threshold of 0.5 to improve the prediction result from imbalanced data. Integrating various techniques in a systematic framework is a smart strategy to improve prediction results from imbalanced data. It is recommended for clinical experts in kidney transplantation to use the final calibrated model as a decision support system to predict the risk of graft failure for individual patients.

摘要

简介

近几十年来,终末期肾病的患病率不断上升,对肾脏替代疗法的需求也不断增加。尽管肾移植比透析更能提高生活质量并降低治疗费用,但移植后仍有可能发生移植物衰竭。因此,本研究旨在使用选定的机器学习预测模型预测埃塞俄比亚肾移植受者移植后发生移植物衰竭的风险。

方法

从 2015 年 9 月至 2022 年 2 月,从埃塞俄比亚国家肾脏移植中心的肾移植受者回顾性队列中提取数据。针对数据不平衡的问题,我们进行了超参数调整、概率阈值移动、基于树的集成学习、堆叠集成学习和概率校准,以提高预测结果。应用基于优点选择的概率模型(逻辑回归、朴素贝叶斯和人工神经网络)和基于树的集成模型(随机森林、袋装树和随机梯度提升)。通过判别和校准性能来比较模型。然后,使用表现最佳的模型来预测移植物衰竭的风险。

结果

共分析了 278 例完成病例,其中 21 例移植物失败,每个预测因子 3 个事件。其中,74.8%为男性,25.2%为女性,中位年龄为 37 岁。在个体模型比较中,袋装树和随机森林具有最高和相同的判别性能(AUC-ROC=0.84)。相比之下,随机森林具有最佳的校准性能(Brier 得分=0.045)。在作为堆叠集成学习的元学习者测试个体模型时,随机梯度提升作为元学习者的判别性能(AUC-ROC=0.88)和校准性能(Brier 得分=0.048)最高。关于特征重要性,慢性排斥反应、血尿素氮、移植后住院次数、磷水平、急性排斥反应和尿路上皮并发症是移植物衰竭的主要预测因素。

结论

对于在不平衡数据上进行临床风险预测,装袋、提升和堆叠以及概率校准是不错的选择。与自然阈值 0.5 相比,数据驱动的概率阈值更有利于提高不平衡数据的预测结果。在系统框架中整合各种技术是提高不平衡数据预测结果的明智策略。建议肾脏移植临床专家使用最终校准模型作为决策支持系统,预测个体患者移植物衰竭的风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/95d94ab79850/12911_2023_2185_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/a6b2f2bc6d62/12911_2023_2185_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/8f5942aae0d1/12911_2023_2185_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/c10648ccd6de/12911_2023_2185_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/c2507951c9c8/12911_2023_2185_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/1df14328618c/12911_2023_2185_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/4ac1e9ee916c/12911_2023_2185_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/0272073ec197/12911_2023_2185_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/2c2e7edb3f9f/12911_2023_2185_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/95d94ab79850/12911_2023_2185_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/a6b2f2bc6d62/12911_2023_2185_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/8f5942aae0d1/12911_2023_2185_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/c10648ccd6de/12911_2023_2185_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/c2507951c9c8/12911_2023_2185_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/1df14328618c/12911_2023_2185_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/4ac1e9ee916c/12911_2023_2185_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/0272073ec197/12911_2023_2185_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/2c2e7edb3f9f/12911_2023_2185_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f056/10201795/95d94ab79850/12911_2023_2185_Fig9_HTML.jpg

相似文献

1
Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia.使用机器学习算法对不平衡数据进行分类,以预测埃塞俄比亚肾移植失败的风险。
BMC Med Inform Decis Mak. 2023 May 22;23(1):98. doi: 10.1186/s12911-023-02185-5.
2
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
3
Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.用于预测埃塞俄比亚 COVID-19 死亡率的机器学习算法。
BMC Public Health. 2024 Jun 28;24(1):1728. doi: 10.1186/s12889-024-19196-0.
4
A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system.一种新的混合集成机器学习模型,用于严重程度风险评估和 COVID 后预测系统。
Math Biosci Eng. 2022 Apr 13;19(6):6102-6123. doi: 10.3934/mbe.2022285.
5
Comparing ensemble learning algorithms and severity of illness scoring systems in cardiac intensive care units: a retrospective study.比较心脏重症监护病房中的集成学习算法和疾病严重程度评分系统:一项回顾性研究。
Einstein (Sao Paulo). 2024 Oct 14;22:eAO0467. doi: 10.31744/einstein_journal/2024AO0467. eCollection 2024.
6
Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.机器学习混合模型预测慢性肾脏病。
Comput Intell Neurosci. 2023 Mar 14;2023:9266889. doi: 10.1155/2023/9266889. eCollection 2023.
7
Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms.创伤性损伤患者的医院死亡率预测:比较不同基于 SMOTE 的机器学习算法。
BMC Med Res Methodol. 2023 Apr 22;23(1):101. doi: 10.1186/s12874-023-01920-w.
8
Can Machine-learning Algorithms Predict Early Revision TKA in the Danish Knee Arthroplasty Registry?机器学习算法能否预测丹麦膝关节置换登记处的早期翻修 TKA?
Clin Orthop Relat Res. 2020 Sep;478(9):2088-2101. doi: 10.1097/CORR.0000000000001343.
9
Artificial intelligence for predicting survival following deceased donor liver transplantation: Retrospective multi-center study.人工智能预测脑死亡供肝移植术后患者的生存情况:回顾性多中心研究。
Int J Surg. 2022 Sep;105:106838. doi: 10.1016/j.ijsu.2022.106838. Epub 2022 Aug 24.
10
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.

引用本文的文献

1
Exploring machine learning classification for community based health insurance enrollment in Ethiopia.探索用于埃塞俄比亚社区医疗保险登记的机器学习分类方法。
Front Public Health. 2025 Jul 18;13:1549210. doi: 10.3389/fpubh.2025.1549210. eCollection 2025.
2
A noninvasive model for chronic kidney disease screening and common pathological type identification from retinal images.一种用于从视网膜图像中筛查慢性肾病并识别常见病理类型的无创模型。
Nat Commun. 2025 Jul 29;16(1):6962. doi: 10.1038/s41467-025-62273-0.
3
Predictive Model for In-Hospital Death in Older Patients with Type 2 Diabetes Mellitus: A Multicenter Retrospective Study in Southwest China.

本文引用的文献

1
The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression.类别不平衡校正对风险预测模型的危害:使用逻辑回归进行说明和模拟。
J Am Med Inform Assoc. 2022 Aug 16;29(9):1525-1534. doi: 10.1093/jamia/ocac093.
2
Global Perspective on Kidney Transplantation: United States.肾脏移植的全球视角:美国
Kidney360. 2021 Aug 19;2(11):1836-1839. doi: 10.34067/KID.0002472021. eCollection 2021 Nov 25.
3
Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods.
中国西南地区老年2型糖尿病患者院内死亡的预测模型:一项多中心回顾性研究
Diabetes Metab Syndr Obes. 2025 Jun 9;18:1873-1889. doi: 10.2147/DMSO.S527018. eCollection 2025.
4
Heuristic optimization in classification atoms in molecules using GCN via uniform simulated annealing.通过均匀模拟退火使用图卷积网络对分子中的原子进行分类的启发式优化。
Sci Rep. 2025 May 20;15(1):17519. doi: 10.1038/s41598-025-00340-8.
5
Developing clinical prognostic models to predict graft survival after renal transplantation: comparison of statistical and machine learning models.开发临床预后模型以预测肾移植后的移植物存活:统计模型与机器学习模型的比较
BMC Med Inform Decis Mak. 2025 Feb 3;25(1):54. doi: 10.1186/s12911-025-02906-y.
6
Comparative study of ten machine learning algorithms for short-term forecasting in gas warning systems.气体预警系统中十种机器学习算法用于短期预测的比较研究
Sci Rep. 2024 Sep 20;14(1):21969. doi: 10.1038/s41598-024-67283-4.
7
A machine learning approach towards assessing consistency and reproducibility: an application to graft survival across three kidney transplantation eras.一种用于评估一致性和可重复性的机器学习方法:在三个肾脏移植时代的移植物存活情况中的应用
Front Digit Health. 2024 Sep 3;6:1427845. doi: 10.3389/fdgth.2024.1427845. eCollection 2024.
8
The transformative potential of artificial intelligence in solid organ transplantation.人工智能在实体器官移植中的变革潜力。
Front Transplant. 2024 Mar 15;3:1361491. doi: 10.3389/frtra.2024.1361491. eCollection 2024.
9
The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case-control study.数据的预测能力:基于个人、临床、临床前和实验室变量的病例对照研究中对 Covid-19 死亡率的机器学习分析。
BMC Infect Dis. 2024 Apr 18;24(1):411. doi: 10.1186/s12879-024-09298-w.
10
A cost-sensitive deep neural network-based prediction model for the mortality in acute myocardial infarction patients with hypertension on imbalanced data.一种基于成本敏感深度神经网络的预测模型,用于不平衡数据下高血压急性心肌梗死患者的死亡率预测
Front Cardiovasc Med. 2024 Mar 19;11:1276608. doi: 10.3389/fcvm.2024.1276608. eCollection 2024.
基于灵活机器学习方法的类别不平衡环境下的糖尿病风险预测。
BMC Med Inform Decis Mak. 2022 Feb 10;22(1):36. doi: 10.1186/s12911-022-01775-z.
4
Feasibility of Machine Learning and Logistic Regression Algorithms to Predict Outcome in Orthopaedic Trauma Surgery.机器学习和逻辑回归算法预测骨科创伤手术结果的可行性
J Bone Joint Surg Am. 2022 Mar 16;104(6):544-551. doi: 10.2106/JBJS.21.00341.
5
Long-term outcomes after kidney transplant failure and variables related to risk of death and probability of retransplant: Results from a single-center cohort study in Brazil.肾移植失败后的长期预后及与死亡风险和再次移植概率相关的因素:巴西单中心队列研究结果。
PLoS One. 2021 Jan 20;16(1):e0245628. doi: 10.1371/journal.pone.0245628. eCollection 2021.
6
High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms.利用机器学习算法,基于 DEM 衍生品、哨兵-1 和哨兵-2 数据进行土壤有机碳和土壤全氮的高分辨率数字制图。
Sci Total Environ. 2020 Aug 10;729:138244. doi: 10.1016/j.scitotenv.2020.138244. Epub 2020 Apr 13.
7
Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest, REPTree: A case study at the Gumani River Basin, India.使用二元逻辑回归、随机森林、集成旋转森林、REPTree 预测森林砍伐概率:印度 Gumani 河流域的案例研究。
Sci Total Environ. 2020 Aug 15;730:139197. doi: 10.1016/j.scitotenv.2020.139197. Epub 2020 May 4.
8
Study of cardiovascular disease prediction model based on random forest in eastern China.基于随机森林的中国东部地区心血管疾病预测模型研究。
Sci Rep. 2020 Mar 23;10(1):5245. doi: 10.1038/s41598-020-62133-5.
9
Prevalence of Chronic Kidney Disease and Associated Factors among Patients with Diabetes in Northwest Ethiopia: A Hospital-Based Cross-Sectional Study.埃塞俄比亚西北部糖尿病患者慢性肾脏病的患病率及相关因素:一项基于医院的横断面研究
Curr Ther Res Clin Exp. 2020 Feb 26;92:100578. doi: 10.1016/j.curtheres.2020.100578. eCollection 2020.
10
A tutorial on calibration measurements and calibration models for clinical prediction models.临床预测模型的校准测量和校准模型教程。
J Am Med Inform Assoc. 2020 Apr 1;27(4):621-633. doi: 10.1093/jamia/ocz228.