一种用于诊断糖尿病的生物标志物驱动且可解释的机器学习模型。

A Biomarker-Driven and Interpretable Machine Learning Model for Diagnosing Diabetes Mellitus.

作者信息

Xiao Zhihui, Wang Mingfu, Zhao Yueliang, Wang Hui

机构信息

College of Food Science and Technology Shanghai Ocean University Shanghai China.

Shenzhen Key Laboratory of Food Nutrition and Health, College of Chemistry and Environmental Engineering Shenzhen University Shenzhen China.

出版信息

Food Sci Nutr. 2025 Apr 30;13(5):e70234. doi: 10.1002/fsn3.70234. eCollection 2025 May.

DOI:10.1002/fsn3.70234

PMID:40313792

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12041655/

Abstract

Diabetes is one of the leading causes of death and disability worldwide. Developing earlier and more accurate diagnosis methods is crucial for clinical prevention and treatment of diabetes. Here, data on biochemical indicators and physiological characteristics of 4335 participants from the National Health and Nutrition Examination Survey (NHANES) database from 2017 to 2020 were collected. After data preprocessing, the dataset was randomly divided into a training set (70%) and a test set (30%); then the Boruta algorithm was used to screen feature indicators on the training set. Next, three machine learning algorithms, including Random Forest (RF), Multi-Layer Perceptron (MLP), and Extreme Gradient Boosting (XGBoost) were employed to build predictive models through 10-fold cross-validation on the training dataset, followed by performance evaluation on the test dataset. The RF model exhibited the best performance, with an area under the curve (AUC) of 0.958 (95% CI: 0.943-0.973), a recall of 0.897, a specificity and F1 score of 0.916 and 0.747, respectively, and an overall accuracy of 0.913. Moreover, SHapley Additive exPlanations (SHAP) and Partial Dependency Plots (PDP) were applied to interpret the RF model to analyze the risk factors for diabetes. Glycohemoglobin, glucose, fasting glucose, age, cholesterol, osmolality, BMI, blood urea nitrogen, and insulin were found to exert the greatest influence on the prevalence of diabetes. Collectively, the RF model has considerable application prospects for the diagnosis of diabetes and can serve as a valuable supplementary tool for clinical diagnosis and risk assessment in diabetes.

摘要

糖尿病是全球主要的死亡和致残原因之一。开发更早、更准确的诊断方法对于糖尿病的临床预防和治疗至关重要。在此，收集了2017年至2020年美国国家健康与营养检查调查（NHANES）数据库中4335名参与者的生化指标和生理特征数据。经过数据预处理后，将数据集随机分为训练集（70%）和测试集（30%）；然后使用Boruta算法在训练集上筛选特征指标。接下来，采用随机森林（RF）、多层感知器（MLP）和极端梯度提升（XGBoost）三种机器学习算法，通过对训练数据集进行10折交叉验证来构建预测模型，随后在测试数据集上进行性能评估。RF模型表现出最佳性能，曲线下面积（AUC）为0.958（95%置信区间：0.943 - 0.973），召回率为0.897，特异性和F1分数分别为0.916和0.747，总体准确率为0.913。此外，应用SHapley加法解释（SHAP）和局部依赖图（PDP）来解释RF模型，以分析糖尿病的危险因素。发现糖化血红蛋白、葡萄糖、空腹血糖、年龄、胆固醇、渗透压、体重指数、血尿素氮和胰岛素对糖尿病患病率影响最大。总体而言，RF模型在糖尿病诊断方面具有相当大的应用前景，可作为糖尿病临床诊断和风险评估的有价值补充工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38f0/12041655/20b8acf9dec0/FSN3-13-e70234-g006.jpg

相似文献

A Biomarker-Driven and Interpretable Machine Learning Model for Diagnosing Diabetes Mellitus.一种用于诊断糖尿病的生物标志物驱动且可解释的机器学习模型。

Food Sci Nutr. 2025 Apr 30;13(5):e70234. doi: 10.1002/fsn3.70234. eCollection 2025 May.

Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation.基于机器学习和 Shapley 加法解释的 2 型糖尿病患者外周血管疾病预测模型和风险分析。

Front Endocrinol (Lausanne). 2024 Feb 28;15:1320335. doi: 10.3389/fendo.2024.1320335. eCollection 2024.

Application of an interpretable machine learning method to predict the risk of death during hospitalization in patients with acute myocardial infarction combined with diabetes mellitus.应用可解释机器学习方法预测急性心肌梗死合并糖尿病患者住院期间的死亡风险。

Acta Cardiol. 2025 Apr 8:1-18. doi: 10.1080/00015385.2025.2481662.

Effects of Various Heavy Metal Exposures on Insulin Resistance in Non-diabetic Populations: Interpretability Analysis from Machine Learning Modeling Perspective.各种重金属暴露对非糖尿病人群胰岛素抵抗的影响：基于机器学习建模视角的可解释性分析

Biol Trace Elem Res. 2024 Dec;202(12):5438-5452. doi: 10.1007/s12011-024-04126-3. Epub 2024 Feb 26.

Early prediction of sepsis associated encephalopathy in elderly ICU patients using machine learning models: a retrospective study based on the MIMIC-IV database.使用机器学习模型对老年重症监护病房患者脓毒症相关脑病进行早期预测：一项基于MIMIC-IV数据库的回顾性研究

Front Cell Infect Microbiol. 2025 Apr 17;15:1545979. doi: 10.3389/fcimb.2025.1545979. eCollection 2025.

A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study.长期护理机构中老年人身体约束的风险预测模型：机器学习研究。

J Med Internet Res. 2023 Apr 6;25:e43815. doi: 10.2196/43815.

Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology.利用环境挥发性有机化合物暴露识别美国人群心血管疾病风险：基于 SHAP 方法的机器学习预测模型。

Ecotoxicol Environ Saf. 2024 Nov 1;286:117210. doi: 10.1016/j.ecoenv.2024.117210. Epub 2024 Oct 23.

Development and validation of a prediction model for coronary heart disease risk in depressed patients aged 20 years and older using machine learning algorithms.使用机器学习算法开发并验证针对20岁及以上抑郁症患者冠心病风险的预测模型。

Front Cardiovasc Med. 2025 Jan 9;11:1504957. doi: 10.3389/fcvm.2024.1504957. eCollection 2024.

Constructing a predictive model for early-onset sepsis in neonatal intensive care unit newborns based on SHapley Additive exPlanations explainable machine learning.基于SHapley加性解释可解释机器学习构建新生儿重症监护病房新生儿早发性败血症的预测模型。

Transl Pediatr. 2024 Nov 30;13(11):1933-1946. doi: 10.21037/tp-24-278. Epub 2024 Nov 26.

Development and Validation of Machine Learning Models for Identifying Prediabetes and Diabetes in Normoglycemia.开发和验证用于识别正常血糖中的糖尿病前期和糖尿病的机器学习模型。

Diabetes Metab Res Rev. 2024 Nov;40(8):e70003. doi: 10.1002/dmrr.70003.

本文引用的文献

Using machine learning techniques to predict the risk of osteoporosis based on nationwide chronic disease data.利用机器学习技术，基于全国慢性病数据预测骨质疏松症的风险。

Sci Rep. 2024 Mar 4;14(1):5245. doi: 10.1038/s41598-024-56114-1.

Identifying top ten predictors of type 2 diabetes through machine learning analysis of UK Biobank data.通过对英国生物库数据的机器学习分析，确定 2 型糖尿病的十大预测因子。

Sci Rep. 2024 Jan 24;14(1):2102. doi: 10.1038/s41598-024-52023-5.

Predicting coronary heart disease in Chinese diabetics using machine learning.应用机器学习预测中国糖尿病人群的冠心病风险。

Comput Biol Med. 2024 Feb;169:107952. doi: 10.1016/j.compbiomed.2024.107952. Epub 2024 Jan 5.

Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort.基于机器学习的可解释风险模型识别超重人群的影响因素：一项大型回顾性队列研究。

Endocrine. 2024 Mar;83(3):604-614. doi: 10.1007/s12020-023-03536-y. Epub 2023 Sep 30.

Advances in the Management of Diabetes Mellitus: A Focus on Personalized Medicine.糖尿病管理的进展：聚焦个性化医疗。

Cureus. 2023 Aug 18;15(8):e43697. doi: 10.7759/cureus.43697. eCollection 2023 Aug.

Detection of diabetic patients in people with normal fasting glucose using machine learning.利用机器学习检测空腹血糖正常人群中的糖尿病患者。

BMC Med. 2023 Sep 7;21(1):342. doi: 10.1186/s12916-023-03045-9.

Chronic kidney disease prediction based on machine learning algorithms.基于机器学习算法的慢性肾脏病预测

J Pathol Inform. 2023 Jan 12;14:100189. doi: 10.1016/j.jpi.2023.100189. eCollection 2023.

Development of an interpretable machine learning model associated with heavy metals' exposure to identify coronary heart disease among US adults via SHAP: Findings of the US NHANES from 2003 to 2018.开发与重金属暴露相关的可解释机器学习模型，通过 SHAP 在美国成年人中识别冠心病：2003 年至 2018 年美国 NHANES 的研究结果。

Chemosphere. 2023 Jan;311(Pt 1):137039. doi: 10.1016/j.chemosphere.2022.137039. Epub 2022 Oct 29.

The burden and risks of emerging complications of diabetes mellitus.糖尿病新并发症的负担和风险。

Nat Rev Endocrinol. 2022 Sep;18(9):525-539. doi: 10.1038/s41574-022-00690-7. Epub 2022 Jun 6.

Population-centric risk prediction modeling for gestational diabetes mellitus: A machine learning approach.以人群为中心的妊娠糖尿病风险预测建模：一种机器学习方法。

Diabetes Res Clin Pract. 2022 Mar;185:109237. doi: 10.1016/j.diabres.2022.109237. Epub 2022 Feb 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于诊断糖尿病的生物标志物驱动且可解释的机器学习模型。

A Biomarker-Driven and Interpretable Machine Learning Model for Diagnosing Diabetes Mellitus.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献