Suppr超能文献

基于 SMOTE-ENN 和 Boruta 的集成贝叶斯网络对糖尿病进行早期预警和因素分析。

Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta.

机构信息

Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China.

Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China.

出版信息

Sci Rep. 2023 Aug 5;13(1):12718. doi: 10.1038/s41598-023-40036-5.

Abstract

Diabetes mellitus (DM) has become the third chronic non-infectious disease affecting patients after tumor, cardiovascular and cerebrovascular diseases, becoming one of the major public health issues worldwide. Detection of early warning risk factors for DM is key to the prevention of DM, which has been the focus of some previous studies. Therefore, from the perspective of residents' self-management and prevention, this study constructed Bayesian networks (BNs) combining feature screening and multiple resampling techniques for DM monitoring data with a class imbalance in Shanxi Province, China, to detect risk factors in chronic disease monitoring programs and predict the risk of DM. First, univariate analysis and Boruta feature selection algorithm were employed to conduct the preliminary screening of all included risk factors. Then, three resampling techniques, SMOTE, Borderline-SMOTE (BL-SMOTE) and SMOTE-ENN, were adopted to deal with data imbalance. Finally, BNs developed by three algorithms (Tabu, Hill-climbing and MMHC) were constructed using the processed data to find the warning factors that strongly correlate with DM. The results showed that the accuracy of DM classification is significantly improved by the BNs constructed by processed data. In particular, the BNs combined with the SMOTE-ENN resampling improved the most, and the BNs constructed by the Tabu algorithm obtained the best classification performance compared with the hill-climbing and MMHC algorithms. The best-performing joint Boruta-SMOTE-ENN-Tabu model showed that the risk factors of DM included family history, age, central obesity, hyperlipidemia, salt reduction, occupation, heart rate, and BMI.

摘要

糖尿病(DM)已成为继肿瘤、心脑血管疾病之后影响患者的第三大慢性非传染性疾病,成为全球主要的公共卫生问题之一。检测 DM 的早期预警风险因素是预防 DM 的关键,这一直是之前一些研究的重点。因此,从居民自我管理和预防的角度出发,本研究构建了贝叶斯网络(BNs),结合特征筛选和多种重采样技术,对中国山西省 DM 监测数据进行不平衡分类,以检测慢性病监测计划中的风险因素,并预测 DM 的发病风险。首先,采用单变量分析和 Boruta 特征选择算法对所有纳入的风险因素进行初步筛选。然后,采用三种重采样技术(SMOTE、Borderline-SMOTE(BL-SMOTE)和 SMOTE-ENN)来处理数据不平衡问题。最后,使用处理后的数据构建由三种算法(Tabu、Hill-climbing 和 MMHC)开发的 BNs,以寻找与 DM 强相关的预警因素。结果表明,经过数据处理构建的 BNs 可显著提高 DM 分类的准确性。特别是,结合 SMOTE-ENN 重采样的 BNs 改善最为明显,与 Hill-climbing 和 MMHC 算法相比,Tabu 算法构建的 BNs 获得了最佳的分类性能。表现最好的联合 Boruta-SMOTE-ENN-Tabu 模型表明,DM 的风险因素包括家族史、年龄、中心性肥胖、血脂异常、减盐、职业、心率和 BMI。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a243/10404250/9248ff34448b/41598_2023_40036_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验