Department of Medical Informatics and Statistics, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan.
Department of Prevention of Noncommunicable Diseases and Promotion of Health Checkup, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan.
Front Public Health. 2024 Nov 1;12:1495054. doi: 10.3389/fpubh.2024.1495054. eCollection 2024.
Chronic kidney disease (CKD) is characterized by a decreased glomerular filtration rate or renal injury (especially proteinuria) for at least 3 months. The early detection and treatment of CKD, a major global public health concern, before the onset of symptoms is important. This study aimed to develop machine learning models to predict the risk of developing CKD within 1 and 5 years using health examination data.
Data were collected from patients who underwent annual health examinations between 2017 and 2022. Among the 30,273 participants included in the study, 1,372 had CKD. Demographic characteristics, body mass index, blood pressure, blood and urine test results, and questionnaire responses were used to predict the risk of CKD development at 1 and 5 years. This study examined three outcomes: incident estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m, the development of proteinuria, and incident eGFR <60 mL/min/1.73 m or the development of proteinuria. Logistic regression (LR), conditional logistic regression, neural network, and recurrent neural network were used to develop the prediction models.
All models had predictive values, sensitivities, and specificities >0.8 for predicting the onset of CKD in 1 year when the outcome was eGFR <60 mL/min/1.73 m. The area under the receiver operating characteristic curve (AUROC) was >0.9. With LR and a neural network, the specificities were 0.749 and 0.739 and AUROCs were 0.889 and 0.890, respectively, for predicting onset within 5 years. The AUROCs of most models were approximately 0.65 when the outcome was eGFR <60 mL/min/1.73 m or proteinuria. The predictive performance of all models exhibited a significant decrease when eGFR was not included as an explanatory variable (AUROCs: 0.498-0.732).
Machine learning models can predict the risk of CKD, and eGFR plays a crucial role in predicting the onset of CKD. However, it is difficult to predict the onset of proteinuria based solely on health examination data. Further studies must be conducted to predict the decline in eGFR and increase in urine protein levels.
慢性肾脏病(CKD)的特征是肾小球滤过率下降或肾脏损伤(尤其是蛋白尿)持续至少 3 个月。在症状出现之前,早期发现和治疗 CKD 是一个主要的全球公共卫生关注点。本研究旨在使用健康检查数据,建立机器学习模型来预测 1 年和 5 年内发生 CKD 的风险。
数据来自于 2017 年至 2022 年间接受年度健康检查的患者。在纳入的 30273 名参与者中,有 1372 名患有 CKD。使用人口统计学特征、体重指数、血压、血液和尿液检查结果以及问卷调查结果来预测 1 年和 5 年内发生 CKD 的风险。本研究检查了三个结果:估算肾小球滤过率(eGFR)<60mL/min/1.73m 下降、蛋白尿的发展以及 eGFR<60mL/min/1.73m 或蛋白尿的发展。使用逻辑回归(LR)、条件逻辑回归、神经网络和递归神经网络来开发预测模型。
当结果为 eGFR<60mL/min/1.73m 时,所有模型在预测 1 年内 CKD 发病方面均具有预测值、敏感性和特异性>0.8。受试者工作特征曲线(ROC)下面积(AUROC)>0.9。使用 LR 和神经网络,特异性分别为 0.749 和 0.739,AUROC 分别为 0.889 和 0.890,用于预测 5 年内发病。当结果为 eGFR<60mL/min/1.73m 或蛋白尿时,大多数模型的 AUROC 约为 0.65。当不将 eGFR 作为解释变量时,所有模型的预测性能均显著下降(AUROC:0.498-0.732)。
机器学习模型可预测 CKD 风险,eGFR 在预测 CKD 发病中起关键作用。然而,仅根据健康检查数据预测蛋白尿的发病较为困难。必须进一步研究以预测 eGFR 下降和尿液蛋白水平升高。