• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于灵活机器学习方法的类别不平衡环境下的糖尿病风险预测。

Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods.

机构信息

Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, P.O. Box 14155-6446, Tehran, Iran.

Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

出版信息

BMC Med Inform Decis Mak. 2022 Feb 10;22(1):36. doi: 10.1186/s12911-022-01775-z.

DOI:10.1186/s12911-022-01775-z
PMID:35139846
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8830137/
Abstract

BACKGROUND

Early detection and prediction of type two diabetes mellitus incidence by baseline measurements could reduce associated complications in the future. The low incidence rate of diabetes in comparison with non-diabetes makes accurate prediction of minority diabetes class more challenging.

METHODS

Deep neural network (DNN), extremely gradient boosting (XGBoost), and random forest (RF) performance is compared in predicting minority diabetes class in Tehran Lipid and Glucose Study (TLGS) cohort data. The impact of changing threshold, cost-sensitive learning, over and under-sampling strategies as solutions to class imbalance have been compared in improving algorithms performance.

RESULTS

DNN with the highest accuracy in predicting diabetes, 54.8%, outperformed XGBoost and RF in terms of AUROC, g-mean, and f1-measure in original imbalanced data. Changing threshold based on the maximum of f1-measure improved performance in g-mean, and f1-measure in three algorithms. Repeated edited nearest neighbors (RENN) under-sampling in DNN and cost-sensitive learning in tree-based algorithms were the best solutions to tackle the imbalance issue. RENN increased ROC and Precision-Recall AUCs, g-mean and f1-measure from 0.857, 0.603, 0.713, 0.575 to 0.862, 0.608, 0.773, 0.583, respectively in DNN. Weighing improved g-mean and f1-measure from 0.667, 0.554 to 0.776, 0.588 in XGBoost, and from 0.659, 0.543 to 0.775, 0.566 in RF, respectively. Also, ROC and Precision-Recall AUCs in RF increased from 0.840, 0.578 to 0.846, 0.591, respectively.

CONCLUSION

G-mean experienced the most increase by all imbalance solutions. Weighing and changing threshold as efficient strategies, in comparison with resampling methods are faster solutions to handle class imbalance. Among sampling strategies, under-sampling methods had better performance than others.

摘要

背景

通过基线测量早期发现和预测 2 型糖尿病的发病率,可以减少未来的相关并发症。与非糖尿病相比,糖尿病的发病率较低,这使得少数糖尿病类别的准确预测更加具有挑战性。

方法

在德黑兰血脂和血糖研究(TLGS)队列数据中,比较了深度神经网络(DNN)、极端梯度提升(XGBoost)和随机森林(RF)在预测少数糖尿病类别的性能。比较了改变阈值、代价敏感学习、过采样和欠采样策略作为解决类别不平衡的方法,以提高算法性能。

结果

DNN 在预测糖尿病方面的准确率最高,为 54.8%,在原始不平衡数据中,其 AUROC、g-mean 和 f1-measure 均优于 XGBoost 和 RF。基于 f1-measure 的最大值改变阈值可提高三种算法的 g-mean 和 f1-measure。在 DNN 中使用重复编辑最近邻(RENN)欠采样和基于树的算法中的代价敏感学习是解决不平衡问题的最佳解决方案。RENN 增加了 DNN 的 ROC 和 Precision-Recall AUCs、g-mean 和 f1-measure,从 0.857、0.603、0.713 和 0.575 分别增加到 0.862、0.608、0.773 和 0.583。在 XGBoost 中,加权从 0.667、0.554 分别提高到 0.776、0.588,在 RF 中,从 0.659、0.543 分别提高到 0.775、0.566。此外,RF 的 ROC 和 Precision-Recall AUC 也分别从 0.840、0.578 增加到 0.846、0.591。

结论

所有不平衡解决方案中,g-mean 的增幅最大。与重采样方法相比,加权和改变阈值是处理类别不平衡的更快速的解决方案。在采样策略中,欠采样方法的性能优于其他方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/fb41a189d4e1/12911_2022_1775_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/a2e1ad8d307a/12911_2022_1775_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/8606a6b4f4bb/12911_2022_1775_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/61d2803ee23c/12911_2022_1775_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/5b463c67956e/12911_2022_1775_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/fb41a189d4e1/12911_2022_1775_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/a2e1ad8d307a/12911_2022_1775_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/8606a6b4f4bb/12911_2022_1775_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/61d2803ee23c/12911_2022_1775_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/5b463c67956e/12911_2022_1775_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2819/8830137/fb41a189d4e1/12911_2022_1775_Fig5_HTML.jpg

相似文献

1
Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods.基于灵活机器学习方法的类别不平衡环境下的糖尿病风险预测。
BMC Med Inform Decis Mak. 2022 Feb 10;22(1):36. doi: 10.1186/s12911-022-01775-z.
2
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略:以脑出血为例。
BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.
3
Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms.创伤性损伤患者的医院死亡率预测:比较不同基于 SMOTE 的机器学习算法。
BMC Med Res Methodol. 2023 Apr 22;23(1):101. doi: 10.1186/s12874-023-01920-w.
4
Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm.预测成年人糖尿病:使用机器学习算法在 5 年队列研究中识别不平衡数据中的重要特征。
BMC Med Res Methodol. 2024 Sep 27;24(1):220. doi: 10.1186/s12874-024-02341-z.
5
Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.我们是否需要不同的机器学习算法来进行定量构效关系建模?对 16 种机器学习算法在 14 个定量构效关系数据集上的综合评估。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa321.
6
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
7
Predictive Analysis of Diabetes-Risk with Class Imbalance.糖尿病风险的不平衡分类预测分析。
Comput Intell Neurosci. 2022 Oct 11;2022:3078025. doi: 10.1155/2022/3078025. eCollection 2022.
8
Machine Learning Models for Predicting Influential Factors of Early Outcomes in Acute Ischemic Stroke: Registry-Based Study.用于预测急性缺血性卒中早期预后影响因素的机器学习模型:基于登记处的研究
JMIR Med Inform. 2022 Mar 25;10(3):e32508. doi: 10.2196/32508.
9
Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction.机器学习和深度学习模型中用于药物-靶标相互作用预测的重采样技术的比较研究。
Molecules. 2023 Feb 9;28(4):1663. doi: 10.3390/molecules28041663.
10
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.

引用本文的文献

1
Evolution of diabetes prediction using the fusion of ANOVA, ADASYN technique and XGBoost based on body composition data.基于身体成分数据,利用方差分析、自适应合成(ADASYN)技术和极端梯度提升(XGBoost)融合进行糖尿病预测的进展
J Diabetes Metab Disord. 2025 Jun 17;24(2):151. doi: 10.1007/s40200-025-01661-1. eCollection 2025 Dec.
2
Machine learning and artificial intelligence in type 2 diabetes prediction: a comprehensive 33-year bibliometric and literature analysis.机器学习与人工智能在2型糖尿病预测中的应用:一项为期33年的全面文献计量学与文献分析
Front Digit Health. 2025 Mar 27;7:1557467. doi: 10.3389/fdgth.2025.1557467. eCollection 2025.
3

本文引用的文献

1
Breast cancer recurrence prediction with ensemble methods and cost-sensitive learning.基于集成方法和代价敏感学习的乳腺癌复发预测
Open Med (Wars). 2021 May 13;16(1):754-768. doi: 10.1515/med-2021-0282. eCollection 2021.
2
A clinical diagnostic model based on an eXtreme Gradient Boosting algorithm to distinguish type 1 diabetes.一种基于极端梯度提升算法的临床诊断模型,用于区分1型糖尿病。
Ann Transl Med. 2021 Mar;9(5):409. doi: 10.21037/atm-20-7115.
3
The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.
Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets.
使用优化的机器学习方法对不平衡数据集进行糖尿病分类的稳健预测框架。
Front Artif Intell. 2025 Jan 7;7:1499530. doi: 10.3389/frai.2024.1499530. eCollection 2024.
4
Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm.预测成年人糖尿病:使用机器学习算法在 5 年队列研究中识别不平衡数据中的重要特征。
BMC Med Res Methodol. 2024 Sep 27;24(1):220. doi: 10.1186/s12874-024-02341-z.
5
Decoding 2.3 million ECGs: interpretable deep learning for advancing cardiovascular diagnosis and mortality risk stratification.解码230万份心电图:用于推进心血管疾病诊断和死亡风险分层的可解释深度学习
Eur Heart J Digit Health. 2024 Feb 19;5(3):247-259. doi: 10.1093/ehjdh/ztae014. eCollection 2024 May.
6
Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus.基于递归特征消除和集成随机森林的糖尿病分类方法。
PLoS One. 2024 May 8;19(5):e0302595. doi: 10.1371/journal.pone.0302595. eCollection 2024.
7
The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case-control study.数据的预测能力:基于个人、临床、临床前和实验室变量的病例对照研究中对 Covid-19 死亡率的机器学习分析。
BMC Infect Dis. 2024 Apr 18;24(1):411. doi: 10.1186/s12879-024-09298-w.
8
Implementing a Novel Machine Learning System for Nutrition Education in Diabetes Mellitus Nutritional Clinic: Predicting 1-Year Blood Glucose Control.在糖尿病营养诊所实施用于营养教育的新型机器学习系统:预测1年血糖控制情况
Bioengineering (Basel). 2023 Sep 28;10(10):1139. doi: 10.3390/bioengineering10101139.
9
AUD-DSS: a decision support system for early detection of patients with alcohol use disorder.AUD-DSS:用于早期检测酒精使用障碍患者的决策支持系统。
BMC Bioinformatics. 2023 Sep 2;24(1):329. doi: 10.1186/s12859-023-05450-6.
10
Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia.使用机器学习算法对不平衡数据进行分类,以预测埃塞俄比亚肾移植失败的风险。
BMC Med Inform Decis Mak. 2023 May 22;23(1):98. doi: 10.1186/s12911-023-02185-5.
在二分类混淆矩阵评估中,马修斯相关系数(MCC)比平衡准确率、庄家知情度和标记度更可靠。
BioData Min. 2021 Feb 4;14(1):13. doi: 10.1186/s13040-021-00244-z.
4
Imbalanced learning: Improving classification of diabetic neuropathy from magnetic resonance imaging.不平衡学习:改善磁共振成像中糖尿病周围神经病的分类。
PLoS One. 2020 Dec 15;15(12):e0243907. doi: 10.1371/journal.pone.0243907. eCollection 2020.
5
A review on deep learning approaches in healthcare systems: Taxonomies, challenges, and open issues.深度学习在医疗系统中的应用综述:分类法、挑战和未解决的问题。
J Biomed Inform. 2021 Jan;113:103627. doi: 10.1016/j.jbi.2020.103627. Epub 2020 Nov 28.
6
Predicting Cardiovascular Risk in Athletes: Resampling Improves Classification Performance.预测运动员的心血管风险:重采样可提高分类性能。
Int J Environ Res Public Health. 2020 Oct 28;17(21):7923. doi: 10.3390/ijerph17217923.
7
Coronary Artery Disease Diagnosis; Ranking the Significant Features Using a Random Trees Model.冠状动脉疾病诊断;使用随机树模型对重要特征进行排名。
Int J Environ Res Public Health. 2020 Jan 23;17(3):731. doi: 10.3390/ijerph17030731.
8
Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques.利用机器学习技术构建 2 型糖尿病风险预测模型。
Prev Chronic Dis. 2019 Sep 19;16:E130. doi: 10.5888/pcd16.190109.
9
A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis.基于机器学习的方法预测透析患者心血管疾病的爆发。
Comput Methods Programs Biomed. 2019 Aug;177:9-15. doi: 10.1016/j.cmpb.2019.05.005. Epub 2019 May 13.
10
A systematic study of the class imbalance problem in convolutional neural networks.卷积神经网络中类不平衡问题的系统研究。
Neural Netw. 2018 Oct;106:249-259. doi: 10.1016/j.neunet.2018.07.011. Epub 2018 Jul 29.