He Jiayi, Wang Xin, Zhu Peiqi, Wang Xiaoxu, Zhang Yitong, Zhao Jing, Sun Wei, Hu Kongfa, He Weiming, Xie Jiadong
School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China.
Jiangsu Province Engineering Research Center of TCM Intelligence Health Service, Nanjing University of Chinese Medicine, Nanjing, China.
EClinicalMedicine. 2025 Jun 11;84:103286. doi: 10.1016/j.eclinm.2025.103286. eCollection 2025 Jun.
Chronic Kidney Disease (CKD) has become a significant global public health issue, affecting approximately 10% of adults. Due to the lack of obvious symptoms in the early stages, CKD is often difficult to diagnose in a timely manner, leading to the gradual progression of the disease, which can eventually develop into End-Stage Renal Disease (ESRD). This study applied machine learning (ML) methods to integrate patient clinical data and developed an early CKD prediction model applicable to individuals . The model aims to , thereby .
This study is a retrospective multicenter study conducted in China, including patients with CKD and healthy individuals who underwent physical examinations from February 2021 to April 2024. Six ML methods, including Decision Tree, Multilayer Perceptron, and XGBoost, were used to predict CKD, integrating different combinations of features such as blood routine, urine analysis, and blood biochemistry. Multiple evaluation metrics, including AUC and F1 score, were used to compare the prediction performance. The SHAP interpretability method was applied to assess feature importance and explain the final model's results.
Data from three hospitals were used in this study, with the dataset divided into training and internal validation sets (CKD: 11,436 cases, non-CKD: 10,004 cases) and an external validation set (CKD: 350 cases, non-CKD: 473 cases). Among the six ML models, XGBoost performed the best. Regarding feature combinations, the "blood routine + urinalysis + basic information" combination yielded the best performance (AUC = 0.9235, external validation AUC = 0.8962). Additionally, a web tool was developed in this study to facilitate the application of early CKD risk prediction in clinical practice.
This study applied an interpretable ML model to effectively predict early CKD. Even when using the relatively low-cost "blood routine + urinalysis + basic information" combination, the model still demonstrated high prediction accuracy. This method has potential clinical application prospects and may help identify early CKD, reducing the risk of disease progression.
This research was supported by the National Key Research and Development Program of China (2023YFC3502903, 2022YFC3502302), National Natural Science Foundation of China (82074580), and Science and Technology Project of Jiangsu Provincial Research Institute of Chinese Medicine Schools (JSZYLP2024011).
慢性肾脏病(CKD)已成为一个重大的全球公共卫生问题,影响着约10%的成年人。由于早期缺乏明显症状,CKD往往难以及时诊断,导致疾病逐渐进展,最终可能发展为终末期肾病(ESRD)。本研究应用机器学习(ML)方法整合患者临床数据,开发了一种适用于个体的CKD早期预测模型。该模型旨在,从而。
本研究是在中国进行的一项回顾性多中心研究,纳入了2021年2月至2024年4月期间接受体检的CKD患者和健康个体。使用了六种ML方法,包括决策树、多层感知器和XGBoost,来预测CKD,整合了血常规、尿液分析和血液生化等不同特征组合。使用包括AUC和F1分数在内的多个评估指标来比较预测性能。应用SHAP可解释性方法评估特征重要性并解释最终模型的结果。
本研究使用了三家医院的数据,数据集分为训练集和内部验证集(CKD:11436例,非CKD:10004例)以及外部验证集(CKD:350例,非CKD:473例)。在六种ML模型中,XGBoost表现最佳。在特征组合方面,“血常规+尿液分析+基本信息”组合表现最佳(AUC = 0.9235,外部验证AUC = 0.8962)。此外,本研究开发了一个网络工具,以促进CKD早期风险预测在临床实践中的应用。
本研究应用了一种可解释的ML模型来有效预测CKD早期。即使使用成本相对较低的“血常规+尿液分析+基本信息”组合,该模型仍显示出较高的预测准确性。该方法具有潜在的临床应用前景,可能有助于识别CKD早期,降低疾病进展风险。
本研究得到了中国国家重点研发计划(2023YFC3502903,2022YFC3502302)、国家自然科学基金(82074580)以及江苏省中医药学校研究院科技项目(JSZYLP2024011)的支持。