比较机器学习与传统统计学方法在未诊断糖尿病预测模型中的应用。

Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods.

机构信息

Department of Sports Industry Studies, Yonsei University, Seoul, Republic of Korea.

Frontier Research Institute of Convergence Sports Science, Yonsei University, Seoul, Republic of Korea.

出版信息

Sci Rep. 2023 Aug 11;13(1):13101. doi: 10.1038/s41598-023-40170-0.

DOI:10.1038/s41598-023-40170-0

PMID:37567907

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10421881/

Abstract

We compared the prediction performance of machine learning-based undiagnosed diabetes prediction models with that of traditional statistics-based prediction models. We used the 2014-2020 Korean National Health and Nutrition Examination Survey (KNHANES) (N = 32,827). The KNHANES 2014-2018 data were used as training and internal validation sets and the 2019-2020 data as external validation sets. The receiver operating characteristic curve area under the curve (AUC) was used to compare the prediction performance of the machine learning-based and the traditional statistics-based prediction models. Using sex, age, resting heart rate, and waist circumference as features, the machine learning-based model showed a higher AUC (0.788 vs. 0.740) than that of the traditional statistical-based prediction model. Using sex, age, waist circumference, family history of diabetes, hypertension, alcohol consumption, and smoking status as features, the machine learning-based prediction model showed a higher AUC (0.802 vs. 0.759) than the traditional statistical-based prediction model. The machine learning-based prediction model using features for maximum prediction performance showed a higher AUC (0.819 vs. 0.765) than the traditional statistical-based prediction model. Machine learning-based prediction models using anthropometric and lifestyle measurements may outperform the traditional statistics-based prediction models in predicting undiagnosed diabetes.

摘要

我们比较了基于机器学习的未诊断糖尿病预测模型与基于传统统计学的预测模型的预测性能。我们使用了 2014-2020 年韩国国家健康和营养检查调查（KNHANES）（N=32827）的数据。KNHANES 2014-2018 年的数据用于训练和内部验证集，2019-2020 年的数据用于外部验证集。我们使用接收者操作特征曲线下的曲线面积（AUC）来比较基于机器学习和基于传统统计学的预测模型的预测性能。使用性别、年龄、静息心率和腰围作为特征，基于机器学习的模型显示出更高的 AUC（0.788 比 0.740）。使用性别、年龄、腰围、糖尿病家族史、高血压、饮酒和吸烟状况作为特征，基于机器学习的预测模型显示出更高的 AUC（0.802 比 0.759）。使用最大预测性能的特征的基于机器学习的预测模型显示出更高的 AUC（0.819 比 0.765）。使用人体测量和生活方式测量的基于机器学习的预测模型可能比基于传统统计学的预测模型在预测未诊断的糖尿病方面表现更好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb3a/10421881/e705b1576043/41598_2023_40170_Fig1_HTML.jpg

相似文献

Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods.比较机器学习与传统统计学方法在未诊断糖尿病预测模型中的应用。

Sci Rep. 2023 Aug 11;13(1):13101. doi: 10.1038/s41598-023-40170-0.

Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type.基于生活方式类型的数据驱动糖尿病预测的机器学习模型。

Int J Environ Res Public Health. 2022 Nov 15;19(22):15027. doi: 10.3390/ijerph192215027.

A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.基于机器学习的数据驱动方法预测糖尿病和心血管疾病。

BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5.

Development of Various Diabetes Prediction Models Using Machine Learning Techniques.使用机器学习技术开发各种糖尿病预测模型。

Diabetes Metab J. 2022 Jul;46(4):650-657. doi: 10.4093/dmj.2021.0115. Epub 2022 Mar 11.

Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques.基于舌象特征和机器学习技术的无创糖尿病风险预测模型的建立。

Int J Med Inform. 2021 May;149:104429. doi: 10.1016/j.ijmedinf.2021.104429. Epub 2021 Feb 22.

Development of a clinical guideline to predict undiagnosed diabetes in dental patients.制定临床指南以预测牙科患者未确诊的糖尿病。

J Am Dent Assoc. 2011 Jan;142(1):28-37. doi: 10.14219/jada.archive.2011.0025.

Development and validation of a new diabetes index for the risk classification of present and new-onset diabetes: multicohort study.开发和验证一种新的糖尿病指数，用于现有和新发糖尿病的风险分类：多队列研究。

Sci Rep. 2021 Aug 3;11(1):15748. doi: 10.1038/s41598-021-95341-8.

The predictive value of resting heart rate in identifying undiagnosed diabetes in Korean adults: Korea National Health and Nutrition Examination Survey.静息心率对韩国成年人未诊断糖尿病的预测价值：韩国国家健康和营养检查调查。

Epidemiol Health. 2022;44:e2022009. doi: 10.4178/epih.e2022009. Epub 2022 Jan 3.

Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.利用大数据和机器学习方法从电子健康记录中准确预测高血压患者的冠心病：模型开发与性能评估

JMIR Med Inform. 2020 Jul 6;8(7):e17257. doi: 10.2196/17257.

Machine-learning-based models to predict cardiovascular risk using oculomics and clinic variables in KNHANES.基于机器学习的模型，利用韩国国家健康与营养检查调查（KNHANES）中的眼科学和临床变量预测心血管风险。

BioData Min. 2024 Apr 22;17(1):12. doi: 10.1186/s13040-024-00363-3.

引用本文的文献

Artificial Intelligence Enabled Lifestyle Medicine in Diabetes Care: A Narrative Review.人工智能助力糖尿病护理中的生活方式医学：一项叙述性综述。

Am J Lifestyle Med. 2025 Jul 17:15598276251359185. doi: 10.1177/15598276251359185.

Improving T2D machine learning-based prediction accuracy with SNPs and younger age.利用单核苷酸多态性（SNPs）和较年轻的年龄提高基于机器学习的2型糖尿病预测准确性。

Comput Struct Biotechnol J. 2025 Jun 23;27:2772-2781. doi: 10.1016/j.csbj.2025.06.038. eCollection 2025.

Incidence and predictors of type 2 diabetes mellitus in a population-based cohort study in Abu Dhabi.在阿布扎比一项基于人群的队列研究中2型糖尿病的发病率及预测因素

Sci Rep. 2025 Jul 2;15(1):23639. doi: 10.1038/s41598-025-07631-0.

Machine Learning for Predicting the Transition From Gestational Diabetes to Type 2 Diabetes: A Systematic Review.用于预测妊娠期糖尿病向2型糖尿病转变的机器学习：一项系统综述。

Cureus. 2025 May 18;17(5):e84314. doi: 10.7759/cureus.84314. eCollection 2025 May.

Prediction Models for Postoperative Delirium of Cardiovascular Surgery (PODOCVS): Protocol for a Systematic Review.心血管手术术后谵妄的预测模型（PODOCVS）：系统评价方案

JMIR Res Protoc. 2025 Jun 9;14:e75368. doi: 10.2196/75368.

Applications of machine learning approaches for pediatric asthma exacerbation management: a systematic review.机器学习方法在儿童哮喘急性发作管理中的应用：一项系统综述。

BMC Med Inform Decis Mak. 2025 Apr 18;25(1):170. doi: 10.1186/s12911-025-02990-0.

Learning from the machine: is diabetes in adults predicted by lifestyle variables? A retrospective predictive modelling study of NHANES 2007-2018.向机器学习：成人糖尿病能否由生活方式变量预测？一项对2007 - 2018年美国国家健康与营养检查调查（NHANES）的回顾性预测建模研究。

BMJ Open. 2025 Mar 22;15(3):e096595. doi: 10.1136/bmjopen-2024-096595.

Research on Prediction model of Carotid-Femoral Pulse Wave Velocity: Based on Machine Learning Algorithm.基于机器学习算法的颈股动脉脉搏波速度预测模型研究

J Clin Hypertens (Greenwich). 2025 Mar;27(3):e70017. doi: 10.1111/jch.70017.

Indirect estimation of the prevalence of type 2 diabetes mellitus in the sub-population of Tehran: using non-laboratory risk-score models in Iran.伊朗非实验室风险评分模型在德黑兰亚人群中估算 2 型糖尿病患病率的间接研究。

BMC Public Health. 2024 Oct 12;24(1):2797. doi: 10.1186/s12889-024-20278-2.

Modeling the Determinants of Subjective Well-Being in Schizophrenia.精神分裂症主观幸福感决定因素的建模

Schizophr Bull. 2025 Jul 7;51(4):1118-1133. doi: 10.1093/schbul/sbae156.

本文引用的文献

Epidemiol Health. 2022;44:e2022009. doi: 10.4178/epih.e2022009. Epub 2022 Jan 3.

Diabetes Fact Sheets in Korea, 2020: An Appraisal of Current Status.2020 年韩国糖尿病情况概述：对现状的评估

Diabetes Metab J. 2021 Jan;45(1):1-10. doi: 10.4093/dmj.2020.0254. Epub 2021 Jan 13.

Development and Validation of the Korean Diabetes Risk Score: A 10-Year National Cohort Study.韩国糖尿病风险评分的开发与验证：一项为期10年的全国队列研究。

Diabetes Metab J. 2018 Oct;42(5):402-414. doi: 10.4093/dmj.2018.0014. Epub 2018 Jul 6.

Data resource profile: the Korea National Health and Nutrition Examination Survey (KNHANES).数据资源简介：韩国国家健康与营养检查调查（KNHANES）

Int J Epidemiol. 2014 Feb;43(1):69-77. doi: 10.1093/ije/dyt228.

A simple screening score for diabetes for the Korean population: development, validation, and comparison with other scores.韩国人群的糖尿病简易筛查评分：研发、验证及与其他评分的比较。

Diabetes Care. 2012 Aug;35(8):1723-30. doi: 10.2337/dc11-2347. Epub 2012 Jun 11.

A risk score for predicting the incidence of type 2 diabetes in a middle-aged Korean cohort: the Korean genome and epidemiology study.预测中年韩国队列 2 型糖尿病发病风险的评分：韩国基因组与流行病学研究。

Circ J. 2012;76(8):1904-10. doi: 10.1253/circj.cj-11-1236. Epub 2012 May 28.

Permutation importance: a corrected feature importance measure.排列重要性：一种修正的特征重要性度量。

Bioinformatics. 2010 May 15;26(10):1340-7. doi: 10.1093/bioinformatics/btq134. Epub 2010 Apr 12.

Estimation of the Youden Index and its associated cutoff point.尤登指数及其相关截断点的估计。

Biom J. 2005 Aug;47(4):458-72. doi: 10.1002/bimj.200410135.

Use of the diabetes risk score for opportunistic screening of undiagnosed diabetes and impaired glucose tolerance: the IGLOO (Impaired Glucose Tolerance and Long-Term Outcomes Observational) study.使用糖尿病风险评分对未诊断的糖尿病和糖耐量受损进行机会性筛查：IGLOO（糖耐量受损与长期结局观察）研究

Diabetes Care. 2005 May;28(5):1187-94. doi: 10.2337/diacare.28.5.1187.

The problem of overfitting.过拟合问题。

J Chem Inf Comput Sci. 2004 Jan-Feb;44(1):1-12. doi: 10.1021/ci0342472.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

比较机器学习与传统统计学方法在未诊断糖尿病预测模型中的应用。

Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献