• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于 SMOTE-ENN 和 Boruta 的集成贝叶斯网络对糖尿病进行早期预警和因素分析。

Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta.

机构信息

Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China.

Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China.

出版信息

Sci Rep. 2023 Aug 5;13(1):12718. doi: 10.1038/s41598-023-40036-5.

DOI:10.1038/s41598-023-40036-5
PMID:37543637
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10404250/
Abstract

Diabetes mellitus (DM) has become the third chronic non-infectious disease affecting patients after tumor, cardiovascular and cerebrovascular diseases, becoming one of the major public health issues worldwide. Detection of early warning risk factors for DM is key to the prevention of DM, which has been the focus of some previous studies. Therefore, from the perspective of residents' self-management and prevention, this study constructed Bayesian networks (BNs) combining feature screening and multiple resampling techniques for DM monitoring data with a class imbalance in Shanxi Province, China, to detect risk factors in chronic disease monitoring programs and predict the risk of DM. First, univariate analysis and Boruta feature selection algorithm were employed to conduct the preliminary screening of all included risk factors. Then, three resampling techniques, SMOTE, Borderline-SMOTE (BL-SMOTE) and SMOTE-ENN, were adopted to deal with data imbalance. Finally, BNs developed by three algorithms (Tabu, Hill-climbing and MMHC) were constructed using the processed data to find the warning factors that strongly correlate with DM. The results showed that the accuracy of DM classification is significantly improved by the BNs constructed by processed data. In particular, the BNs combined with the SMOTE-ENN resampling improved the most, and the BNs constructed by the Tabu algorithm obtained the best classification performance compared with the hill-climbing and MMHC algorithms. The best-performing joint Boruta-SMOTE-ENN-Tabu model showed that the risk factors of DM included family history, age, central obesity, hyperlipidemia, salt reduction, occupation, heart rate, and BMI.

摘要

糖尿病(DM)已成为继肿瘤、心脑血管疾病之后影响患者的第三大慢性非传染性疾病,成为全球主要的公共卫生问题之一。检测 DM 的早期预警风险因素是预防 DM 的关键,这一直是之前一些研究的重点。因此,从居民自我管理和预防的角度出发,本研究构建了贝叶斯网络(BNs),结合特征筛选和多种重采样技术,对中国山西省 DM 监测数据进行不平衡分类,以检测慢性病监测计划中的风险因素,并预测 DM 的发病风险。首先,采用单变量分析和 Boruta 特征选择算法对所有纳入的风险因素进行初步筛选。然后,采用三种重采样技术(SMOTE、Borderline-SMOTE(BL-SMOTE)和 SMOTE-ENN)来处理数据不平衡问题。最后,使用处理后的数据构建由三种算法(Tabu、Hill-climbing 和 MMHC)开发的 BNs,以寻找与 DM 强相关的预警因素。结果表明,经过数据处理构建的 BNs 可显著提高 DM 分类的准确性。特别是,结合 SMOTE-ENN 重采样的 BNs 改善最为明显,与 Hill-climbing 和 MMHC 算法相比,Tabu 算法构建的 BNs 获得了最佳的分类性能。表现最好的联合 Boruta-SMOTE-ENN-Tabu 模型表明,DM 的风险因素包括家族史、年龄、中心性肥胖、血脂异常、减盐、职业、心率和 BMI。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/ed5a6d4630ba/41598_2023_40036_Fig4a_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/9248ff34448b/41598_2023_40036_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/fb3208f786b6/41598_2023_40036_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/a0a7ce54aad3/41598_2023_40036_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/ed5a6d4630ba/41598_2023_40036_Fig4a_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/9248ff34448b/41598_2023_40036_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/fb3208f786b6/41598_2023_40036_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/a0a7ce54aad3/41598_2023_40036_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/ed5a6d4630ba/41598_2023_40036_Fig4a_HTML.jpg

相似文献

1
Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta.基于 SMOTE-ENN 和 Boruta 的集成贝叶斯网络对糖尿病进行早期预警和因素分析。
Sci Rep. 2023 Aug 5;13(1):12718. doi: 10.1038/s41598-023-40036-5.
2
Application of a novel hybrid algorithm of Bayesian network in the study of hyperlipidemia related factors: a cross-sectional study.贝叶斯网络混合算法在高脂血症相关因素研究中的应用:一项横断面研究。
BMC Public Health. 2021 Jul 12;21(1):1375. doi: 10.1186/s12889-021-11412-5.
3
Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier.基于随机森林分类器的糖尿病分类探索性研究。
BMC Med Inform Decis Mak. 2021 Mar 20;21(1):105. doi: 10.1186/s12911-021-01471-4.
4
Using Bayesian networks with Tabu-search algorithm to explore risk factors for hyperhomocysteinemia.使用贝叶斯网络和禁忌搜索算法探索高同型半胱氨酸血症的危险因素。
Sci Rep. 2023 Jan 28;13(1):1610. doi: 10.1038/s41598-023-28123-z.
5
A hybrid resampling algorithms SMOTE and ENN based deep learning models for identification of Marburg virus inhibitors.一种基于混合重采样算法SMOTE和ENN的深度学习模型用于鉴定马尔堡病毒抑制剂。
Future Med Chem. 2022 May;14(10):701-715. doi: 10.4155/fmc-2021-0290. Epub 2022 Apr 8.
6
Using Bayesian networks with tabu algorithm to explore factors related to chronic kidney disease with mental illness: A cross-sectional study.使用禁忌搜索算法的贝叶斯网络探究与精神疾病相关的慢性肾脏病的影响因素:一项横断面研究。
Math Biosci Eng. 2023 Aug 10;20(9):16194-16211. doi: 10.3934/mbe.2023723.
7
Identification of Orphan Genes in Unbalanced Datasets Based on Ensemble Learning.基于集成学习的不平衡数据集中孤儿基因的识别
Front Genet. 2020 Oct 2;11:820. doi: 10.3389/fgene.2020.00820. eCollection 2020.
8
Prevalence of hyperlipidemia in Shanxi Province, China and application of Bayesian networks to analyse its related factors.中国山西省高脂血症患病率及贝叶斯网络分析其相关因素。
Sci Rep. 2018 Feb 28;8(1):3750. doi: 10.1038/s41598-018-22167-2.
9
Exploring influencing factors of chronic obstructive pulmonary disease based on elastic net and Bayesian network.基于弹性网络和贝叶斯网络探究慢性阻塞性肺疾病的影响因素。
Sci Rep. 2022 May 9;12(1):7563. doi: 10.1038/s41598-022-11125-8.
10
A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis.一种结合合成少数过采样技术和编辑最近邻的混合采样算法,用于诊断漏诊的流产。
BMC Med Inform Decis Mak. 2022 Dec 29;22(1):344. doi: 10.1186/s12911-022-02075-2.

引用本文的文献

1
An Interpretable Machine Learning Model Based on Inflammatory-Nutritional Biomarkers for Predicting Metachronous Liver Metastases After Colorectal Cancer Surgery.一种基于炎症-营养生物标志物的可解释机器学习模型,用于预测结直肠癌手术后的异时性肝转移。
Biomedicines. 2025 Jul 12;13(7):1706. doi: 10.3390/biomedicines13071706.
2
Development of a risk prediction model for sepsis-related delirium based on multiple machine learning approaches and an online calculator.基于多种机器学习方法和在线计算器开发脓毒症相关性谵妄风险预测模型。
PLoS One. 2025 Jul 16;20(7):e0323831. doi: 10.1371/journal.pone.0323831. eCollection 2025.
3

本文引用的文献

1
A diabetes prediction model based on Boruta feature selection and ensemble learning.基于 Boruta 特征选择和集成学习的糖尿病预测模型。
BMC Bioinformatics. 2023 Jun 1;24(1):224. doi: 10.1186/s12859-023-05300-5.
2
A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis.一种结合合成少数过采样技术和编辑最近邻的混合采样算法,用于诊断漏诊的流产。
BMC Med Inform Decis Mak. 2022 Dec 29;22(1):344. doi: 10.1186/s12911-022-02075-2.
3
Development and assessment of novel machine learning models to predict medication non-adherence risks in type 2 diabetics.
Risk warning model for predicting sleep disorders in healthcare workers on long-term shifts.
预测长期轮班医护人员睡眠障碍的风险预警模型。
Sleep Biol Rhythms. 2025 Apr 10;23(3):331-342. doi: 10.1007/s41105-025-00583-y. eCollection 2025 Jul.
4
An explainable machine learning model for predicting the risk of distant metastasis in intrahepatic cholangiocarcinoma: a population-based cohort study.一种用于预测肝内胆管癌远处转移风险的可解释机器学习模型:一项基于人群的队列研究。
Discov Oncol. 2025 Jun 18;16(1):1140. doi: 10.1007/s12672-025-02952-y.
5
Development and validation of a machine learning model for in-hospital mortality prediction in children under 5 years with heart failure.用于预测5岁以下心力衰竭儿童院内死亡率的机器学习模型的开发与验证
Front Pediatr. 2025 May 26;13:1608334. doi: 10.3389/fped.2025.1608334. eCollection 2025.
6
Predicting liver metastasis in pancreatic neuroendocrine tumors with an interpretable machine learning algorithm: a SEER-based study.使用可解释机器学习算法预测胰腺神经内分泌肿瘤中的肝转移:一项基于监测、流行病学和最终结果(SEER)数据库的研究
Front Med (Lausanne). 2025 May 1;12:1533132. doi: 10.3389/fmed.2025.1533132. eCollection 2025.
7
Assessing the predictive value of time-in-range level for the risk of postoperative infection in patients with type 2 diabetes: a cohort study.评估血糖在目标范围内的水平对2型糖尿病患者术后感染风险的预测价值:一项队列研究。
Front Endocrinol (Lausanne). 2025 Apr 15;16:1539039. doi: 10.3389/fendo.2025.1539039. eCollection 2025.
8
Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm.基于机器学习算法构建肺癌患者化疗后肺部感染风险预测模型。
Front Oncol. 2024 Aug 9;14:1403392. doi: 10.3389/fonc.2024.1403392. eCollection 2024.
9
Optimization of diabetes prediction methods based on combinatorial balancing algorithm.基于组合平衡算法的糖尿病预测方法优化。
Nutr Diabetes. 2024 Aug 14;14(1):63. doi: 10.1038/s41387-024-00324-z.
10
Advances in Flavonoid Research: Sources, Biological Activities, and Developmental Prospectives.类黄酮研究进展:来源、生物活性及发展前景
Curr Issues Mol Biol. 2024 Mar 26;46(4):2884-2925. doi: 10.3390/cimb46040181.
开发和评估新型机器学习模型,以预测 2 型糖尿病患者的药物不依从风险。
Front Public Health. 2022 Nov 17;10:1000622. doi: 10.3389/fpubh.2022.1000622. eCollection 2022.
4
Prediction of type 2 diabetes using genome-wide polygenic risk score and metabolic profiles: A machine learning analysis of population-based 10-year prospective cohort study.基于全基因组多基因风险评分和代谢谱预测 2 型糖尿病:基于人群的 10 年前瞻性队列研究的机器学习分析。
EBioMedicine. 2022 Dec;86:104383. doi: 10.1016/j.ebiom.2022.104383. Epub 2022 Nov 30.
5
Detecting High-Risk Factors and Early Diagnosis of Diabetes Using Machine Learning Methods.使用机器学习方法检测糖尿病的高危因素和早期诊断。
Comput Intell Neurosci. 2022 Sep 29;2022:2557795. doi: 10.1155/2022/2557795. eCollection 2022.
6
Healthy lifestyle, metabolomics and incident type 2 diabetes in a population-based cohort from Spain.健康生活方式、代谢组学与西班牙人群中 2 型糖尿病发病风险
Int J Behav Nutr Phys Act. 2022 Jan 27;19(1):8. doi: 10.1186/s12966-021-01219-3.
7
Using machine learning methods to predict hepatic encephalopathy in cirrhotic patients with unbalanced data.使用机器学习方法预测肝硬化患者肝性脑病的不平衡数据。
Comput Methods Programs Biomed. 2021 Nov;211:106420. doi: 10.1016/j.cmpb.2021.106420. Epub 2021 Sep 16.
8
Risk Identification of Bronchopulmonary Dysplasia in Premature Infants Based on Machine Learning.基于机器学习的早产儿支气管肺发育不良风险识别
Front Pediatr. 2021 Aug 17;9:719352. doi: 10.3389/fped.2021.719352. eCollection 2021.
9
Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches.使用逻辑回归和机器学习方法预测 2 型糖尿病。
Int J Environ Res Public Health. 2021 Jul 9;18(14):7346. doi: 10.3390/ijerph18147346.
10
Trends in Prevalence of Diabetes and Control of Risk Factors in Diabetes Among US Adults, 1999-2018.1999 - 2018年美国成年人糖尿病患病率及糖尿病危险因素控制趋势
JAMA. 2021 Jun 25;326(8):1-13. doi: 10.1001/jama.2021.9883.