College of Physical Education, Shenzhen University, Shenzhen 518000, China.
College of Physical Education, Southwest University, Chongqing 400715, China.
Int J Environ Res Public Health. 2022 Nov 15;19(22):15027. doi: 10.3390/ijerph192215027.
The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999-2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.
近年来,糖尿病的患病率一直在上升,先前的研究发现,机器学习模型是很好的糖尿病预测工具。本研究的目的是比较使用来自国家健康和营养检查调查(NHANES)数据库的生活方式数据的五种不同机器学习模型在糖尿病预测方面的功效。基于人口统计学特征和与生活方式相关的变量,1999-2020 年 NHANES 数据库产生了 17833 个人的数据。为了筛选机器学习模型的训练数据,采用了 Akaike 信息准则(AIC)前向传播算法。为了预测糖尿病,开发了五种机器学习模型(CATBoost、XGBoost、随机森林(RF)、逻辑回归(LR)和支持向量机(SVM))。使用准确性、敏感性、特异性、精度、F1 得分和接收器操作特征(ROC)曲线评估模型性能。在这五种机器学习模型中,能量、碳水化合物和脂肪的饮食摄入量对预测糖尿病患者的影响最大。在模型性能方面,CATBoost 优于 RF、LG、XGBoost 和 SVM。在这五种模型中表现最好的机器学习模型是 CATBoost,其准确率为 82.1%,AUC 为 0.83。基于 NHANES 数据的机器学习模型可以帮助医疗机构识别糖尿病患者。