Suppr超能文献

基于XGBoost模型的2型糖尿病风险预测及其效果评估

Prediction of Type 2 Diabetes Risk and Its Effect Evaluation Based on the XGBoost Model.

作者信息

Wang Liyang, Wang Xiaoya, Chen Angxuan, Jin Xian, Che Huilian

机构信息

Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China.

College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China.

出版信息

Healthcare (Basel). 2020 Jul 31;8(3):247. doi: 10.3390/healthcare8030247.

Abstract

In view of the harm of diabetes to the population, we have introduced an ensemble learning algorithm-EXtreme Gradient Boosting (XGBoost) to predict the risk of type 2 diabetes and compared it with Support Vector Machines (SVM), the Random Forest (RF) and K-Nearest Neighbor (K-NN) algorithm in order to improve the prediction effect of existing models. The combination of convenient sampling and snowball sampling in Xicheng District, Beijing was used to conduct a questionnaire survey on the personal data, eating habits, exercise status and family medical history of 380 middle-aged and elderly people. Then, we trained the models and obtained the disease risk index for each sample with 10-fold cross-validation. Experiments were made to compare the commonly used machine learning algorithms mentioned above and we found that XGBoost had the best prediction effect, with an average accuracy of 0.8909 and the area under the receiver's working characteristic curve (AUC) was 0.9182. Therefore, due to the superiority of its architecture, XGBoost has more outstanding prediction accuracy and generalization ability than existing algorithms in predicting the risk of type 2 diabetes, which is conducive to the intelligent prevention and control of diabetes in the future.

摘要

鉴于糖尿病对人群的危害,我们引入了一种集成学习算法——极端梯度提升(XGBoost)来预测2型糖尿病风险,并将其与支持向量机(SVM)、随机森林(RF)和K近邻(K-NN)算法进行比较,以提高现有模型的预测效果。采用北京市西城区方便抽样与雪球抽样相结合的方法,对380名中老年人的个人资料、饮食习惯、运动状况及家族病史进行问卷调查。然后,我们对模型进行训练,并通过10折交叉验证得到每个样本的疾病风险指数。通过实验对上述常用机器学习算法进行比较,发现XGBoost的预测效果最佳,平均准确率为0.8909,接收器工作特征曲线(AUC)下面积为0.9182。因此,由于其架构的优越性,XGBoost在预测2型糖尿病风险方面比现有算法具有更出色的预测精度和泛化能力,这有利于未来糖尿病的智能防控。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef4a/7551910/a8ab8b8fc1ef/healthcare-08-00247-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验