Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China.
College of Public Health, Xinjiang Medical University, Urumqi, China.
J Diabetes Res. 2020 Sep 24;2020:6873891. doi: 10.1155/2020/6873891. eCollection 2020.
An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas.
A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM.
The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, -1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving).
We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.
全球约有 4.25 亿人患有糖尿病,占全球卫生支出的 12%,且这一数字还在不断增加,这给医疗系统带来了巨大负担,尤其是在那些偏远、服务不足的地区。
本研究共纳入 584168 名参加全国体检的成年受试者。采用基于体格测量和问卷调查的变量的逻辑回归(LR)确定 2 型糖尿病(T2DM)的危险因素。结合 LR 选择的危险因素,采用决策树、随机森林、基于决策树的 AdaBoost(AdaBoost)和极端梯度提升决策树(XGBoost)识别 T2DM 患者,比较四种机器学习分类器的性能,并使用性能最佳的分类器输出 T2DM 变量重要性得分。
结果表明,XGBoost 的性能最佳(准确率=0.906、精确率=0.910、召回率=0.902、F1 值=0.906 和 AUC=0.968)。XGBoost 中变量重要性得分显示,BMI 是最重要的特征,其次是年龄、腰围、收缩压、种族、吸烟量、脂肪肝、高血压、体力活动、饮酒状态、饮食比例(肉与蔬菜)、饮酒量、吸烟状况和饮食习惯(爱吃油)。
我们提出了一种基于 LR-XGBoost 的分类器,该分类器使用患者的 14 个易于获得且非侵入性的变量作为预测变量来识别 T2DM 的潜在事件。该分类器可以准确地筛查早期糖尿病的风险,变量重要性得分可以提供预防糖尿病发生的线索。