使用基于机器学习的预测模型进行 2 型糖尿病的早期检测。

Early detection of type 2 diabetes mellitus using machine learning-based prediction models.

机构信息

Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, 6000, Koper, Slovenia.

Faculty of Health Sciences, University of Maribor, 2000, Maribor, Slovenia.

出版信息

Sci Rep. 2020 Jul 20;10(1):11981. doi: 10.1038/s41598-020-68771-z.

DOI:10.1038/s41598-020-68771-z

PMID:32686721

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7371679/

Abstract

Most screening tests for T2DM in use today were developed using multivariate regression methods that are often further simplified to allow transformation into a scoring formula. The increasing volume of electronically collected data opened the opportunity to develop more complex, accurate prediction models that can be continuously updated using machine learning approaches. This study compares machine learning-based prediction models (i.e. Glmnet, RF, XGBoost, LightGBM) to commonly used regression models for prediction of undiagnosed T2DM. The performance in prediction of fasting plasma glucose level was measured using 100 bootstrap iterations in different subsets of data simulating new incoming data in 6-month batches. With 6 months of data available, simple regression model performed with the lowest average RMSE of 0.838, followed by RF (0.842), LightGBM (0.846), Glmnet (0.859) and XGBoost (0.881). When more data were added, Glmnet improved with the highest rate (+ 3.4%). The highest level of variable selection stability over time was observed with LightGBM models. Our results show no clinically relevant improvement when more sophisticated prediction models were used. Since higher stability of selected variables over time contributes to simpler interpretation of the models, interpretability and model calibration should also be considered in development of clinical prediction models.

摘要

目前用于 T2DM 筛查的大多数检测方法都是基于多元回归方法开发的，这些方法通常进一步简化为评分公式。随着电子采集数据量的增加，为开发更复杂、更准确的预测模型提供了机会，这些模型可以使用机器学习方法不断更新。本研究将基于机器学习的预测模型（即 Glmnet、RF、XGBoost、LightGBM）与常用的回归模型进行比较，以预测未确诊的 T2DM。通过在不同数据子集中进行 100 次 bootstrap 迭代，测量预测空腹血糖水平的性能，模拟 6 个月批次中传入的新数据。在有 6 个月的数据可用的情况下，简单回归模型的平均 RMSE 最低，为 0.838，其次是 RF（0.842）、LightGBM（0.846）、Glmnet（0.859）和 XGBoost（0.881）。当添加更多数据时，Glmnet 以最高速度（+3.4%）得到改善。LightGBM 模型的变量选择稳定性随着时间的推移而提高。当使用更复杂的预测模型时，我们的结果没有显示出临床上的显著改善。由于所选变量的稳定性随着时间的推移而提高，这有助于模型的更简单解释，因此在开发临床预测模型时，还应考虑可解释性和模型校准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/488d/7371679/9b346c54c083/41598_2020_68771_Fig1_HTML.jpg

相似文献

Early detection of type 2 diabetes mellitus using machine learning-based prediction models.

Sci Rep. 2020 Jul 20;10(1):11981. doi: 10.1038/s41598-020-68771-z.

Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.

Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.

Development of a screening tool using electronic health records for undiagnosed Type 2 diabetes mellitus and impaired fasting glucose detection in the Slovenian population.

Diabet Med. 2018 May;35(5):640-649. doi: 10.1111/dme.13605. Epub 2018 Mar 15.

Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-up in Non-Diabetic Patients with Cardiovascular Risks.

Yonsei Med J. 2019 Feb;60(2):191-199. doi: 10.3349/ymj.2019.60.2.191.

A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure.

Comput Methods Programs Biomed. 2023 Jun;235:107537. doi: 10.1016/j.cmpb.2023.107537. Epub 2023 Apr 5.

Artificial Intelligence (AI) based machine learning models predict glucose variability and hypoglycaemia risk in patients with type 2 diabetes on a multiple drug regimen who fast during ramadan (The PROFAST - IT Ramadan study).

Diabetes Res Clin Pract. 2020 Nov;169:108388. doi: 10.1016/j.diabres.2020.108388. Epub 2020 Aug 26.

Predictive modeling of blood pressure during hemodialysis: a comparison of linear model, random forest, support vector regression, XGBoost, LASSO regression and ensemble method.

Comput Methods Programs Biomed. 2020 Oct;195:105536. doi: 10.1016/j.cmpb.2020.105536. Epub 2020 May 22.

The product of fasting plasma glucose and triglycerides improves risk prediction of type 2 diabetes in middle-aged Koreans.

BMC Endocr Disord. 2018 May 30;18(1):33. doi: 10.1186/s12902-018-0259-x.

Predicting long-term type 2 diabetes with support vector machine using oral glucose tolerance test.

PLoS One. 2019 Dec 11;14(12):e0219636. doi: 10.1371/journal.pone.0219636. eCollection 2019.

Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques.

Int J Med Inform. 2021 May;149:104429. doi: 10.1016/j.ijmedinf.2021.104429. Epub 2021 Feb 22.

引用本文的文献

Type 2 Diabetes Prediction Model in China: A Five-Year Systematic Review.

Healthcare (Basel). 2025 Aug 15;13(16):2007. doi: 10.3390/healthcare13162007.

Predictive study of machine learning combined with serum Neuregulin 4 levels for hyperthyroidism in type II diabetes mellitus.

Front Oncol. 2025 Jul 16;15:1595553. doi: 10.3389/fonc.2025.1595553. eCollection 2025.

Improving T2D machine learning-based prediction accuracy with SNPs and younger age.

Comput Struct Biotechnol J. 2025 Jun 23;27:2772-2781. doi: 10.1016/j.csbj.2025.06.038. eCollection 2025.

Application of IRSA-BP neural network in diagnosing diabetes.

PLoS One. 2025 Jun 25;20(6):e0324759. doi: 10.1371/journal.pone.0324759. eCollection 2025.

Evolution of diabetes prediction using the fusion of ANOVA, ADASYN technique and XGBoost based on body composition data.

J Diabetes Metab Disord. 2025 Jun 17;24(2):151. doi: 10.1007/s40200-025-01661-1. eCollection 2025 Dec.

A Biomarker-Driven and Interpretable Machine Learning Model for Diagnosing Diabetes Mellitus.

Food Sci Nutr. 2025 Apr 30;13(5):e70234. doi: 10.1002/fsn3.70234. eCollection 2025 May.

A Machine Learning-Based Method for Developing the Chinese Symptom Checklist-11 (CSCL-11).

Behav Sci (Basel). 2025 Apr 2;15(4):459. doi: 10.3390/bs15040459.

Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.

Commun Med (Lond). 2025 Apr 22;5(1):103. doi: 10.1038/s43856-025-00819-5.

Machine learning and artificial intelligence in type 2 diabetes prediction: a comprehensive 33-year bibliometric and literature analysis.

Front Digit Health. 2025 Mar 27;7:1557467. doi: 10.3389/fdgth.2025.1557467. eCollection 2025.

Interpretable machine learning method to predict the risk of pre-diabetes using a national-wide cross-sectional data: evidence from CHNS.

BMC Public Health. 2025 Mar 26;25(1):1145. doi: 10.1186/s12889-025-22419-7.

本文引用的文献

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.

J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11.

Development of a screening tool using electronic health records for undiagnosed Type 2 diabetes mellitus and impaired fasting glucose detection in the Slovenian population.

Diabet Med. 2018 May;35(5):640-649. doi: 10.1111/dme.13605. Epub 2018 Mar 15.

Prediction of lung cancer patient survival via supervised machine learning classification techniques.

Int J Med Inform. 2017 Dec;108:1-8. doi: 10.1016/j.ijmedinf.2017.09.013. Epub 2017 Sep 25.

Comparative effects of microvascular and macrovascular disease on the risk of major outcomes in patients with type 2 diabetes.

Cardiovasc Diabetol. 2017 Jul 27;16(1):95. doi: 10.1186/s12933-017-0574-y.

Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study.

Sao Paulo Med J. 2017 May-Jun;135(3):234-246. doi: 10.1590/1516-3180.2016.0309010217.

Machine Learning and Data Mining Methods in Diabetes Research.

Comput Struct Biotechnol J. 2017 Jan 8;15:104-116. doi: 10.1016/j.csbj.2016.12.005. eCollection 2017.

Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View.

J Med Internet Res. 2016 Dec 16;18(12):e323. doi: 10.2196/jmir.5870.

The role of the poly(A) tract in the replication and virulence of tick-borne encephalitis virus.

Sci Rep. 2016 Dec 16;6:39265. doi: 10.1038/srep39265.

Why screen for type 2 diabetes?

Diabetes Res Clin Pract. 2016 Nov;121:215-217. doi: 10.1016/j.diabres.2016.11.004.

Replicating Cardiovascular Condition-Birth Month Associations.

Sci Rep. 2016 Sep 14;6:33166. doi: 10.1038/srep33166.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用基于机器学习的预测模型进行 2 型糖尿病的早期检测。

Early detection of type 2 diabetes mellitus using machine learning-based prediction models.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献