• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用全国性横断面数据预测糖尿病前期风险的可解释机器学习方法:来自中国健康与营养调查的证据

Interpretable machine learning method to predict the risk of pre-diabetes using a national-wide cross-sectional data: evidence from CHNS.

作者信息

Li Xiaolong, Ding Fan, Zhang Lu, Zhao Shi, Hu Zengyun, Ma Zhanbing, Li Feng, Zhang Yuhong, Zhao Yi, Zhao Yu

机构信息

School of Public Health, Ningxia Medical University, Yinchuan Ningxia, 750004, China.

NHC Key Laboratory of Metabolic Cardiovascular Diseases Research, Ningxia Medical University, Yinchuan, 750004, China.

出版信息

BMC Public Health. 2025 Mar 26;25(1):1145. doi: 10.1186/s12889-025-22419-7.

DOI:10.1186/s12889-025-22419-7
PMID:40140819
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11938594/
Abstract

OBJECTIVE

The incidence of Type 2 Diabetes Mellitus (T2DM) continues to rise steadily, significantly impacting human health. Early prediction of pre-diabetic risks has emerged as a crucial public health concern in recent years. Machine learning methods have proven effective in enhancing prediction accuracy. However, existing approaches may lack interpretability regarding underlying mechanisms. Therefore, we aim to employ an interpretable machine learning approach utilizing nationwide cross-sectional data to predict pre-diabetic risk and quantify the impact of potential risks.

METHODS

The LASSO regression algorithm was used to conduct feature selection from 30 factors, ultimately identifying nine non-zero coefficient features associated with pre-diabetes, including age, TG, TC, BMI, Apolipoprotein B, TP, leukocyte count, HDL-C, and hypertension. Various machine learning algorithms, including Extreme Gradient Boosting (XGBoost), Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB), Artificial Neural Networks (ANNs), Decision Trees (DT), and Logistic Regression (LR), were employed to compare predictive performance. Employing an interpretable machine learning approach, we aimed to enhance the accuracy of pre-diabetes risk prediction and quantify the impact and significance of potential risks on pre-diabetes.

RESULTS

From the China Health and Nutrition Survey (CHNS) data, a cohort of 8,277 individuals was selected, exhibiting a disease prevalence of 7.13%. The XGBoost model demonstrated superior performance with an AUC value of 0.939, surpassing RF, SVM, DT, ANNs, Naive Bayes, and LR models. Additionally, Shapley Additive Explanation (SHAP) analysis indicated that age, BMI, TC, ApoB, TG, hypertension, TP, HDL-C, and WBC may serve as risk factors for pre-diabetes.

CONCLUSION

The constructed model comprises nine easily accessible predictive factors, which prove highly effective in forecasting the risk of pre-diabetes. Concurrently, we have quantified the specific impact of each predictive factor on the risk and ranked them based on their influence. This result may serve as a convenient tool for early identification of individuals at high risk of pre-diabetes, providing effective guidance for preventing the progression of pre-diabetes to T2DM.

摘要

目的

2型糖尿病(T2DM)的发病率持续稳步上升,对人类健康产生重大影响。近年来,糖尿病前期风险的早期预测已成为一个关键的公共卫生问题。机器学习方法已被证明在提高预测准确性方面有效。然而,现有方法可能缺乏对潜在机制的可解释性。因此,我们旨在采用一种可解释的机器学习方法,利用全国横断面数据来预测糖尿病前期风险并量化潜在风险的影响。

方法

使用LASSO回归算法从30个因素中进行特征选择,最终确定了9个与糖尿病前期相关的非零系数特征,包括年龄、甘油三酯(TG)、总胆固醇(TC)、体重指数(BMI)、载脂蛋白B、总蛋白(TP)、白细胞计数、高密度脂蛋白胆固醇(HDL-C)和高血压。采用了各种机器学习算法,包括极端梯度提升(XGBoost)、随机森林(RF)、支持向量机(SVM)、朴素贝叶斯(NB)、人工神经网络(ANNs)、决策树(DT)和逻辑回归(LR),以比较预测性能。采用可解释的机器学习方法,旨在提高糖尿病前期风险预测的准确性,并量化潜在风险对糖尿病前期的影响和重要性。

结果

从中国健康与营养调查(CHNS)数据中选取了8277名个体组成队列,疾病患病率为7.13%。XGBoost模型表现出卓越性能,AUC值为0.939,超过了RF、SVM、DT、ANNs、朴素贝叶斯和LR模型。此外,夏普利值附加解释(SHAP)分析表明,年龄、BMI、TC、载脂蛋白B(ApoB)、TG、高血压、TP、HDL-C和白细胞(WBC)可能是糖尿病前期的危险因素。

结论

构建的模型包含9个易于获取的预测因素,并在预测糖尿病前期风险方面被证明非常有效。同时,我们已经量化了每个预测因素对风险的具体影响,并根据其影响进行了排序。这一结果可为早期识别糖尿病前期高危个体提供便利工具,为预防糖尿病前期进展为T2DM提供有效指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/f82c04807c3e/12889_2025_22419_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/7d898a2db9f4/12889_2025_22419_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/384207a81e39/12889_2025_22419_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/1019822bb8e2/12889_2025_22419_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/e2f27d049a75/12889_2025_22419_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/536c4d48ea5a/12889_2025_22419_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/c12f5e3b3ebb/12889_2025_22419_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/add1b2f3811c/12889_2025_22419_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/c981f59240fb/12889_2025_22419_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/f82c04807c3e/12889_2025_22419_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/7d898a2db9f4/12889_2025_22419_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/384207a81e39/12889_2025_22419_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/1019822bb8e2/12889_2025_22419_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/e2f27d049a75/12889_2025_22419_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/536c4d48ea5a/12889_2025_22419_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/c12f5e3b3ebb/12889_2025_22419_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/add1b2f3811c/12889_2025_22419_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/c981f59240fb/12889_2025_22419_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d01/11938594/f82c04807c3e/12889_2025_22419_Fig9_HTML.jpg

相似文献

1
Interpretable machine learning method to predict the risk of pre-diabetes using a national-wide cross-sectional data: evidence from CHNS.利用全国性横断面数据预测糖尿病前期风险的可解释机器学习方法:来自中国健康与营养调查的证据
BMC Public Health. 2025 Mar 26;25(1):1145. doi: 10.1186/s12889-025-22419-7.
2
Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。
J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.
3
Machine learning algorithms for diabetic kidney disease risk predictive model of Chinese patients with type 2 diabetes mellitus.用于中国2型糖尿病患者糖尿病肾病风险预测模型的机器学习算法
Ren Fail. 2025 Dec;47(1):2486558. doi: 10.1080/0886022X.2025.2486558. Epub 2025 Apr 7.
4
Predicting isolated impaired glucose tolerance without oral glucose tolerance test using machine learning in Chinese Han men.在中国汉族男性中使用机器学习在不进行口服葡萄糖耐量试验的情况下预测单纯性糖耐量受损
Front Endocrinol (Lausanne). 2025 Apr 24;16:1514397. doi: 10.3389/fendo.2025.1514397. eCollection 2025.
5
Learning from the machine: is diabetes in adults predicted by lifestyle variables? A retrospective predictive modelling study of NHANES 2007-2018.向机器学习:成人糖尿病能否由生活方式变量预测?一项对2007 - 2018年美国国家健康与营养检查调查(NHANES)的回顾性预测建模研究。
BMJ Open. 2025 Mar 22;15(3):e096595. doi: 10.1136/bmjopen-2024-096595.
6
Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation.基于机器学习和 Shapley 加法解释的 2 型糖尿病患者外周血管疾病预测模型和风险分析。
Front Endocrinol (Lausanne). 2024 Feb 28;15:1320335. doi: 10.3389/fendo.2024.1320335. eCollection 2024.
7
Identification of biomarkers for knee osteoarthritis through clinical data and machine learning models.通过临床数据和机器学习模型识别膝关节骨关节炎的生物标志物
Sci Rep. 2025 Jan 11;15(1):1703. doi: 10.1038/s41598-025-85945-9.
8
Machine Learning Analysis of Nutrient Associations with Peripheral Arterial Disease: Insights from NHANES 1999-2004.营养物质与外周动脉疾病关联的机器学习分析:来自1999 - 2004年美国国家健康与营养检查调查(NHANES)的见解
Ann Vasc Surg. 2025 May;114:154-162. doi: 10.1016/j.avsg.2024.12.077. Epub 2025 Jan 30.
9
Evaluation of factors predicting transition from prediabetes to diabetes among patients residing in underserved communities in the United States - A machine learning approach.美国服务不足社区患者中糖尿病前期转变为糖尿病的预测因素评估——一种机器学习方法。
Comput Biol Med. 2025 Mar;187:109824. doi: 10.1016/j.compbiomed.2025.109824. Epub 2025 Feb 11.
10
Development and validation of an interpretable machine learning model for predicting in-hospital mortality for ischemic stroke patients in ICU.用于预测ICU中缺血性中风患者院内死亡率的可解释机器学习模型的开发与验证
Int J Med Inform. 2025 Jun;198:105874. doi: 10.1016/j.ijmedinf.2025.105874. Epub 2025 Mar 9.

引用本文的文献

1
A machine learning model for predicting obesity risk in patients with diabetes mellitus: analysis of NHANES 2007-2018.一种用于预测糖尿病患者肥胖风险的机器学习模型:2007 - 2018年美国国家健康与营养检查调查分析
Front Public Health. 2025 Aug 22;13:1606751. doi: 10.3389/fpubh.2025.1606751. eCollection 2025.
2
Bioinformatics mining and experimental validation of prognostic biomarkers in colorectal cancer.结直肠癌预后生物标志物的生物信息学挖掘与实验验证
Discov Oncol. 2025 Aug 22;16(1):1596. doi: 10.1007/s12672-025-03301-9.
3
Applications of Artificial Intelligence and Machine Learning in Prediabetes: A Scoping Review.

本文引用的文献

1
Novel type 2 diabetes prediction score based on traditional risk factors and circulating metabolites: model derivation and validation in two large cohort studies.基于传统风险因素和循环代谢物的新型2型糖尿病预测评分:两项大型队列研究中的模型推导与验证
EClinicalMedicine. 2024 Dec 6;79:102971. doi: 10.1016/j.eclinm.2024.102971. eCollection 2025 Jan.
2
Association of neutrophil-to-lymphocyte ratio with all-cause and cardiovascular mortality in CVD patients with diabetes or pre-diabetes.中性粒细胞与淋巴细胞比值与合并糖尿病或糖尿病前期的心血管疾病患者全因及心血管死亡率的相关性。
Sci Rep. 2024 Oct 17;14(1):24324. doi: 10.1038/s41598-024-74642-8.
3
人工智能和机器学习在糖尿病前期的应用:一项范围综述
J Diabetes Sci Technol. 2025 Jul 8:19322968251351995. doi: 10.1177/19322968251351995.
Nonlinear relationship between untraditional lipid parameters and the risk of prediabetes: a large retrospective study based on Chinese adults.
非传统脂质参数与糖尿病前期风险的非线性关系:基于中国成年人的大型回顾性研究。
Cardiovasc Diabetol. 2024 Jan 6;23(1):12. doi: 10.1186/s12933-023-02103-z.
4
Quasi-experimental evaluation of a nationwide diabetes prevention programme.全国性糖尿病预防计划的准实验评估。
Nature. 2023 Dec;624(7990):138-144. doi: 10.1038/s41586-023-06756-4. Epub 2023 Nov 15.
5
Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review.靶向极大似然估计在公共卫生和流行病学研究中的应用:系统评价。
Ann Epidemiol. 2023 Oct;86:34-48.e28. doi: 10.1016/j.annepidem.2023.06.004. Epub 2023 Jun 19.
6
Childhood adiposity and novel subtypes of diabetes in adults: a Mendelian randomisation and genome-wide genetic correlation study.儿童肥胖与成人新型糖尿病亚型:一项孟德尔随机化和全基因组遗传相关性研究。
Lancet Glob Health. 2023 Mar;11 Suppl 1:S1. doi: 10.1016/S2214-109X(23)00086-4.
7
Harnessing machine learning models for non-invasive pre-diabetes screening in children and adolescents.利用机器学习模型对儿童和青少年进行非侵入性糖尿病前期筛查。
Comput Methods Programs Biomed. 2022 Nov;226:107180. doi: 10.1016/j.cmpb.2022.107180. Epub 2022 Oct 8.
8
Development of a novel dementia risk prediction model in the general population: A large, longitudinal, population-based machine-learning study.普通人群中新型痴呆风险预测模型的开发:一项基于人群的大型纵向机器学习研究。
EClinicalMedicine. 2022 Sep 23;53:101665. doi: 10.1016/j.eclinm.2022.101665. eCollection 2022 Nov.
9
Association between serum uric acid with diabetes and other biochemical markers.血清尿酸与糖尿病及其他生化指标之间的关联。
J Family Med Prim Care. 2022 Apr;11(4):1401-1409. doi: 10.4103/jfmpc.jfmpc_1833_21. Epub 2022 Mar 18.
10
Prediction of diabetic kidney disease with machine learning algorithms, upon the initial diagnosis of type 2 diabetes mellitus.基于机器学习算法的 2 型糖尿病初始诊断时的糖尿病肾病预测。
BMJ Open Diabetes Res Care. 2022 Jan;10(1). doi: 10.1136/bmjdrc-2021-002560.