Liu Qing, Zhou Qing, He Yifeng, Zou Jingui, Guo Yan, Yan Yaqiong
Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China.
School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China.
J Pers Med. 2022 Jun 27;12(7):1055. doi: 10.3390/jpm12071055.
Identifying people with a high risk of developing diabetes among those with prediabetes may facilitate the implementation of a targeted lifestyle and pharmacological interventions. We aimed to establish machine learning models based on demographic and clinical characteristics to predict the risk of incident diabetes. We used data from the free medical examination service project for elderly people who were 65 years or older to develop logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) machine learning models for the follow-up results of 2019 and 2020 and performed internal validation. The receiver operating characteristic (ROC), sensitivity, specificity, accuracy, and F1 score were used to select the model with better performance. The average annual progression rate to diabetes in prediabetic elderly people was 14.21%. Each model was trained using eight features and one outcome variable from 9607 prediabetic individuals, and the performance of the models was assessed in 2402 prediabetes patients. The predictive ability of four models in the first year was better than in the second year. The XGBoost model performed relatively efficiently (ROC: 0.6742 for 2019 and 0.6707 for 2020). We established and compared four machine learning models to predict the risk of progression from prediabetes to diabetes. Although there was little difference in the performance of the four models, the XGBoost model had a relatively good ROC value, which might perform well in future exploration in this field.
在糖尿病前期人群中识别出糖尿病发病风险高的人群,可能有助于实施有针对性的生活方式和药物干预措施。我们旨在基于人口统计学和临床特征建立机器学习模型,以预测糖尿病发病风险。我们使用了来自65岁及以上老年人免费体检服务项目的数据,为2019年和2020年的随访结果开发逻辑回归(LR)、决策树(DT)、随机森林(RF)和极端梯度提升(XGBoost)机器学习模型,并进行内部验证。使用受试者工作特征(ROC)、敏感性、特异性、准确性和F1分数来选择性能更好的模型。糖尿病前期老年人的糖尿病年均进展率为14.21%。每个模型使用来自9607名糖尿病前期个体的八个特征和一个结果变量进行训练,并在2402名糖尿病前期患者中评估模型的性能。四个模型在第一年的预测能力优于第二年。XGBoost模型表现相对高效(2019年的ROC为0.6742,2020年为0.6707)。我们建立并比较了四个机器学习模型,以预测糖尿病前期进展为糖尿病的风险。尽管四个模型的性能差异不大,但XGBoost模型的ROC值相对较好,在该领域未来的探索中可能表现良好。