Electronics and Communication Engineering Discipline, Khulna University, Khulna, 9208, Bangladesh.
Department of ICT Integrated Ocean Smart Cities Engineering, Dong-A University, Busan, 49315, South Korea.
Sci Rep. 2023 Apr 17;13(1):6263. doi: 10.1038/s41598-023-33525-0.
Chronic kidney disease (CKD) is a condition distinguished by structural and functional changes to the kidney over time. Studies show that 10% of adults worldwide are affected by some kind of CKD, resulting in 1.2 million deaths. Recently, CKD has emerged as a leading cause of mortality worldwide, making it necessary to develop a Computer-Aided Diagnostic (CAD) system to diagnose CKD automatically. Machine Learning (ML) based CAD system can be used by a clinician to automatically diagnoses mass people. Since ML models are considered a black box, it is also necessary to expose influential causes behind a model's prediction of a particular output. So that, a doctor can make a more rational decision based on the model's output and analysis of the features influence on the model. In this paper, we have used the XGBoost as the ML classifier to predict whether a patient has CKD or not. Using the XGBoost classifier, we have obtained an accuracy, precision, recall, and F1 score of [Formula: see text] and [Formula: see text] respectively using all [Formula: see text] features. Furthermore, we have used Biogeography Based Optimization (BBO) algorithm to find an effective subset of the features. The BBO algorithm selected almost half of the initial features. We have obtained an accuracy, precision, recall, and F1 score of [Formula: see text] and [Formula: see text] respectively using only 13 features selected by the BBO algorithm. Finally, we have explained the impact of the feature on the ML models using the SHapley Additive exPlanations (SHAP) analysis. Using SHAP analysis and BBO algorithm, we have found that hemoglobin and albumin mostly contribute to the detection of CKD.
慢性肾脏病(CKD)是一种随着时间的推移导致肾脏结构和功能发生变化的疾病。研究表明,全球有 10%的成年人患有某种形式的 CKD,导致 120 万人死亡。最近,CKD 已成为全球主要的死亡原因之一,因此有必要开发一种计算机辅助诊断(CAD)系统来自动诊断 CKD。基于机器学习(ML)的 CAD 系统可由临床医生用于自动诊断大量人群。由于 ML 模型被认为是一个黑盒子,因此还需要揭示模型对特定输出进行预测背后的影响因素。这样,医生可以根据模型的输出以及特征对模型的影响进行分析,做出更合理的决策。在本文中,我们使用 XGBoost 作为 ML 分类器来预测患者是否患有 CKD。使用 XGBoost 分类器,我们使用所有[Formula: see text]个特征获得了[Formula: see text]的准确性、精度、召回率和 F1 分数。此外,我们使用生物地理学优化(BBO)算法来找到特征的有效子集。BBO 算法选择了几乎一半的初始特征。我们仅使用 BBO 算法选择的 13 个特征获得了[Formula: see text]的准确性、精度、召回率和 F1 分数。最后,我们使用 SHapley Additive exPlanations (SHAP) 分析来解释特征对 ML 模型的影响。通过使用 SHAP 分析和 BBO 算法,我们发现血红蛋白和白蛋白对 CKD 的检测贡献最大。