Phuong Nguyen Dong, Tuyen Nguyen Trung, Linh Vu Thi Thai, Nguyen Nghi N, Nguyen Thanh Q
CIRTech Institute, HUTECH University, Ho Chi Minh City, Vietnam.
Faculty of Information Technology, HUTECH University, Ho Chi Minh City, Vietnam.
Bioinform Biol Insights. 2025 Jul 27;19:11779322251356563. doi: 10.1177/11779322251356563. eCollection 2025.
The kidneys are vital organs responsible for filtering and eliminating toxins from the body. Chronic kidney disease (CKD) is becoming increasingly prevalent, affecting not only older adults but also younger populations. To minimize kidney damage for those at risk, an accurate assessment and monitoring of CKD are crucial. Machine learning models can assist physicians in this task by providing fast and accurate detection. As a result, many health care systems have adopted machine learning, especially for disease diagnosis. In this study, we developed a system to support the diagnosis of CKD. The data were collected from the UCL machine learning database, with missing values filled using the "mean/mode" and the "random sampling method." After data processing, we applied the polynomial technique to generate additional features, allowing the models to be better generalized. Then, we utilized feature-based stratified splitting with K-means and implemented 6 machine learning algorithms (Random Forest, Support Vector Machine [SVM], Naive Bayes, Logistic Regression, K-Nearest Neighbor [KNN], and XGBoost) to compare their performance based on accuracy. Among them, Random Forest, XGBoost, SVM, and logistic regression achieved the highest accuracy of 100%, followed by Naive Bayes (97%) and KNN (93%).
肾脏是负责过滤和清除体内毒素的重要器官。慢性肾脏病(CKD)正变得越来越普遍,不仅影响老年人,也影响年轻人群。为了将处于风险中的人群的肾脏损伤降至最低,对CKD进行准确评估和监测至关重要。机器学习模型可以通过提供快速准确的检测来协助医生完成这项任务。因此,许多医疗保健系统都采用了机器学习,尤其是用于疾病诊断。在本研究中,我们开发了一个支持CKD诊断的系统。数据从伦敦大学学院机器学习数据库收集,使用“均值/众数”和“随机抽样法”填充缺失值。经过数据处理后,我们应用多项式技术生成额外特征,使模型能够得到更好的泛化。然后,我们利用基于特征的分层分割和K均值算法,并实施了6种机器学习算法(随机森林、支持向量机[SVM]、朴素贝叶斯、逻辑回归、K近邻[KNN]和XGBoost),以基于准确率比较它们的性能。其中,随机森林、XGBoost、SVM和逻辑回归的准确率最高,达到100%,其次是朴素贝叶斯(97%)和KNN(93%)。