Department of Sports Industry Studies, Yonsei University, Seoul, Republic of Korea.
Frontier Research Institute of Convergence Sports Science, Yonsei University, Seoul, Republic of Korea.
Sci Rep. 2023 Aug 11;13(1):13101. doi: 10.1038/s41598-023-40170-0.
We compared the prediction performance of machine learning-based undiagnosed diabetes prediction models with that of traditional statistics-based prediction models. We used the 2014-2020 Korean National Health and Nutrition Examination Survey (KNHANES) (N = 32,827). The KNHANES 2014-2018 data were used as training and internal validation sets and the 2019-2020 data as external validation sets. The receiver operating characteristic curve area under the curve (AUC) was used to compare the prediction performance of the machine learning-based and the traditional statistics-based prediction models. Using sex, age, resting heart rate, and waist circumference as features, the machine learning-based model showed a higher AUC (0.788 vs. 0.740) than that of the traditional statistical-based prediction model. Using sex, age, waist circumference, family history of diabetes, hypertension, alcohol consumption, and smoking status as features, the machine learning-based prediction model showed a higher AUC (0.802 vs. 0.759) than the traditional statistical-based prediction model. The machine learning-based prediction model using features for maximum prediction performance showed a higher AUC (0.819 vs. 0.765) than the traditional statistical-based prediction model. Machine learning-based prediction models using anthropometric and lifestyle measurements may outperform the traditional statistics-based prediction models in predicting undiagnosed diabetes.
我们比较了基于机器学习的未诊断糖尿病预测模型与基于传统统计学的预测模型的预测性能。我们使用了 2014-2020 年韩国国家健康和营养检查调查(KNHANES)(N=32827)的数据。KNHANES 2014-2018 年的数据用于训练和内部验证集,2019-2020 年的数据用于外部验证集。我们使用接收者操作特征曲线下的曲线面积(AUC)来比较基于机器学习和基于传统统计学的预测模型的预测性能。使用性别、年龄、静息心率和腰围作为特征,基于机器学习的模型显示出更高的 AUC(0.788 比 0.740)。使用性别、年龄、腰围、糖尿病家族史、高血压、饮酒和吸烟状况作为特征,基于机器学习的预测模型显示出更高的 AUC(0.802 比 0.759)。使用最大预测性能的特征的基于机器学习的预测模型显示出更高的 AUC(0.819 比 0.765)。使用人体测量和生活方式测量的基于机器学习的预测模型可能比基于传统统计学的预测模型在预测未诊断的糖尿病方面表现更好。