Department of Epidemiology and Medical Statistics, Guangdong Medical University, Dongguan, Guangdong, China.
Preventive Medicine and Hygienics, Dongguan Center for Disease Control and Prevention, Dongguan, Guangdong, China.
BMJ Open. 2023 Aug 29;13(8):e069018. doi: 10.1136/bmjopen-2022-069018.
OBJECTIVES: To compare the prediction effects of six models based on machine learning theories, which can provide a methodological reference for predicting the risk of type 2 diabetes mellitus (T2DM). SETTING AND PARTICIPANTS: This study was based on the monitoring data of chronic disease risk factors in Dongguan residents from 2016 to 2018. The multistage cluster random sampling method was adopted at each monitoring site, and 4157 people were finally selected. In the initial population, we excluded individuals with more than 20% missing data and eventually included 4106 subjects. DESIGN: K nearest neighbour algorithm and synthetic minority oversampling technique were used to process the data. Single factor analysis was used for preliminary selection of variables. The 10-fold cross-validation was used to optimise the parameters of some models. The accuracy, precision, recall and area under receiver operating characteristic curve (AUC) were used to evaluate the prediction effect of models, and Delong test was used to analyse the differences of AUC values of each model. RESULTS: After balancing data, the sample size increased to 8013, of which 4023 are patients with T2DM and 3990 in control group. The comparison results of the six models showed that back propagation neural network model has the best prediction effect with 93.7% accuracy, 94.6% accuracy, 92.8% recall and the AUC value of 0.977, followed by logistic model, support vector machine model, CART decision tree model and C4.5 decision tree model. Deep neural network has the worst prediction performance, with 84.5% accuracy, 86.1% precision, 82.9% recall and the AUC value of 0.845. CONCLUSIONS: In this study, six types of risk prediction models for T2DM were constructed, and the predictive effects of these models were compared based on various indicators. The results showed that back propagation neural network based on the selected data set had the best prediction effect.
目的:比较基于机器学习理论的 6 种模型的预测效果,为预测 2 型糖尿病(T2DM)风险提供方法学参考。
设置和参与者:本研究基于 2016-2018 年东莞居民慢性病危险因素监测数据。在每个监测点采用多阶段整群随机抽样方法,最终选取 4157 人。在初始人群中,我们排除了缺失数据超过 20%的个体,最终纳入 4106 例受试者。
设计:采用 K 最近邻算法和合成少数过采样技术处理数据。单因素分析用于初步选择变量。10 折交叉验证用于优化部分模型的参数。使用准确率、精密度、召回率和受试者工作特征曲线(ROC)下面积(AUC)评估模型的预测效果,采用 Delong 检验分析各模型 AUC 值的差异。
结果:数据均衡后,样本量增加到 8013 例,其中 T2DM 患者 4023 例,对照组 3990 例。6 种模型的比较结果显示,反向传播神经网络模型预测效果最佳,准确率为 93.7%,精密度为 94.6%,召回率为 92.8%,AUC 值为 0.977,其次是 logistic 模型、支持向量机模型、CART 决策树模型和 C4.5 决策树模型。深度神经网络的预测性能最差,准确率为 84.5%,精密度为 86.1%,召回率为 82.9%,AUC 值为 0.845。
结论:本研究构建了 6 种 T2DM 风险预测模型,并基于多个指标比较了这些模型的预测效果。结果表明,基于所选数据集的反向传播神经网络具有最佳的预测效果。
BMC Med Inform Decis Mak. 2022-10-25
Diabetes Res Clin Pract. 2022-1
J Pers Med. 2020-3-31
BMC Endocr Disord. 2019-11-26
J Arthroplasty. 2019-6-11
JAMA. 2018-9-18
Eur Endocrinol. 2015-8
Comput Struct Biotechnol J. 2017-1-8
Kaohsiung J Med Sci. 2012-10-16