• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于随机森林分类器的糖尿病分类探索性研究。

Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier.

机构信息

Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China.

Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China.

出版信息

BMC Med Inform Decis Mak. 2021 Mar 20;21(1):105. doi: 10.1186/s12911-021-01471-4.

DOI:10.1186/s12911-021-01471-4
PMID:33743696
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7980612/
Abstract

BACKGROUND

Diabetes Mellitus (DM) has become the third chronic non-communicable disease that hits patients after tumors, cardiovascular and cerebrovascular diseases, and has become one of the major public health problems in the world. Therefore, it is of great importance to identify individuals at high risk for DM in order to establish prevention strategies for DM.

METHODS

Aiming at the problem of high-dimensional feature space and high feature redundancy of medical data, as well as the problem of data imbalance often faced. This study explored different supervised classifiers, combined with SVM-SMOTE and two feature dimensionality reduction methods (Logistic stepwise regression and LAASO) to classify the diabetes survey sample data with unbalanced categories and complex related factors. Analysis and discussion of the classification results of 4 supervised classifiers based on 4 data processing methods. Five indicators including Accuracy, Precision, Recall, F1-Score and AUC are selected as the key indicators to evaluate the performance of the classification model.

RESULTS

According to the result, Random Forest Classifier combining SVM-SMOTE resampling technology and LASSO feature screening method (Accuracy = 0.890, Precision = 0.869, Recall = 0.919, F1-Score = 0.893, AUC = 0.948) proved the best way to tell those at high risk of DM. Besides, the combined algorithm helps enhance the classification performance for prediction of high-risk people of DM. Also, age, region, heart rate, hypertension, hyperlipidemia and BMI are the top six most critical characteristic variables affecting diabetes.

CONCLUSIONS

The Random Forest Classifier combining with SVM-SMOTE and LASSO feature reduction method perform best in identifying high-risk people of DM from individuals. And the combined method proposed in the study would be a good tool for early screening of DM.

摘要

背景

糖尿病(DM)已成为继肿瘤、心脑血管疾病之后危害患者的第三大慢性非传染性疾病,成为全球主要公共卫生问题之一。因此,识别 DM 高危个体,建立 DM 预防策略具有重要意义。

方法

针对医学数据高维特征空间和高特征冗余,以及常面临的数据不平衡问题。本研究探索了不同的有监督分类器,结合 SVM-SMOTE 和两种特征降维方法(Logistic 逐步回归和 LAASO),对类别不平衡且相关因素复杂的糖尿病调查样本数据进行分类。分析和讨论了基于 4 种数据处理方法的 4 种有监督分类器的分类结果。选择准确率、精确率、召回率、F1-Score 和 AUC 五个指标作为评价分类模型性能的关键指标。

结果

结果表明,结合 SVM-SMOTE 重采样技术和 LASSO 特征筛选方法的随机森林分类器(准确率=0.890、精确率=0.869、召回率=0.919、F1-Score=0.893、AUC=0.948)证明了识别 DM 高危人群的最佳方法。此外,联合算法有助于提高预测 DM 高危人群的分类性能。此外,年龄、地区、心率、高血压、高血脂和 BMI 是影响糖尿病的前六个最重要的特征变量。

结论

随机森林分类器结合 SVM-SMOTE 和 LASSO 特征降维方法在识别 DM 高危人群方面表现最佳。并且研究中提出的联合方法将成为 DM 早期筛查的良好工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/218d/7980612/2d7518a18086/12911_2021_1471_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/218d/7980612/b3ba82f0d81c/12911_2021_1471_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/218d/7980612/c1fabf112f72/12911_2021_1471_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/218d/7980612/2d7518a18086/12911_2021_1471_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/218d/7980612/b3ba82f0d81c/12911_2021_1471_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/218d/7980612/c1fabf112f72/12911_2021_1471_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/218d/7980612/2d7518a18086/12911_2021_1471_Fig3_HTML.jpg

相似文献

1
Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier.基于随机森林分类器的糖尿病分类探索性研究。
BMC Med Inform Decis Mak. 2021 Mar 20;21(1):105. doi: 10.1186/s12911-021-01471-4.
2
Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta.基于 SMOTE-ENN 和 Boruta 的集成贝叶斯网络对糖尿病进行早期预警和因素分析。
Sci Rep. 2023 Aug 5;13(1):12718. doi: 10.1038/s41598-023-40036-5.
3
Machine learning-enabled risk prediction of chronic obstructive pulmonary disease with unbalanced data.基于机器学习的慢性阻塞性肺疾病不平衡数据风险预测
Comput Methods Programs Biomed. 2023 Mar;230:107340. doi: 10.1016/j.cmpb.2023.107340. Epub 2023 Jan 6.
4
Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm.预测成年人糖尿病:使用机器学习算法在 5 年队列研究中识别不平衡数据中的重要特征。
BMC Med Res Methodol. 2024 Sep 27;24(1):220. doi: 10.1186/s12874-024-02341-z.
5
Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseases.监督学习算法在心血管疾病预测中的比较分析。
Technol Health Care. 2024;32(S1):241-251. doi: 10.3233/THC-248021.
6
An explainable artificial intelligence framework for risk prediction of COPD in smokers.用于预测吸烟者 COPD 风险的可解释人工智能框架。
BMC Public Health. 2023 Nov 6;23(1):2164. doi: 10.1186/s12889-023-17011-w.
7
Predictive modeling of multi-class diabetes mellitus using machine learning and filtering iraqi diabetes data dynamics.基于机器学习的多类别糖尿病预测建模及伊拉克糖尿病数据动态过滤
PLoS One. 2024 May 16;19(5):e0300785. doi: 10.1371/journal.pone.0300785. eCollection 2024.
8
Prediction of diabetic protein markers based on an ensemble method.基于集成方法的糖尿病蛋白质标志物预测
Front Biosci (Landmark Ed). 2021 Jul 30;26(7):207-221. doi: 10.52586/4935.
9
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
10
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略:以脑出血为例。
BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

引用本文的文献

1
optRF: Optimising random forest stability by determining the optimal number of trees.optRF:通过确定最佳树的数量来优化随机森林稳定性。
BMC Bioinformatics. 2025 Mar 31;26(1):95. doi: 10.1186/s12859-025-06097-1.
2
Addressing data imbalance in collision risk prediction with active generative oversampling.通过主动生成过采样解决碰撞风险预测中的数据不平衡问题。
Sci Rep. 2025 Mar 17;15(1):9133. doi: 10.1038/s41598-025-93851-3.
3
Detecting B-cell lymphoma-6 overexpression status in primary central nervous system lymphoma using multiparametric MRI-based machine learning.

本文引用的文献

1
A Deep Learning Framework for Predicting Response to Therapy in Cancer.深度学习框架预测癌症治疗反应
Cell Rep. 2019 Dec 10;29(11):3367-3373.e4. doi: 10.1016/j.celrep.2019.11.017.
2
Identification of Targetable Pathways in Oral Cancer Patients via Random Forest and Chemical Informatics.通过随机森林和化学信息学识别口腔癌患者中可靶向的通路
Cancer Inform. 2019 Nov 28;18:1176935119889911. doi: 10.1177/1176935119889911. eCollection 2019.
3
Clinical calculator predictive of chemotherapy benefit in stage 1A uterine papillary serous cancers.
使用基于多参数MRI的机器学习检测原发性中枢神经系统淋巴瘤中B细胞淋巴瘤-6的过表达状态。
Neuroradiology. 2025 Mar;67(3):563-573. doi: 10.1007/s00234-025-03551-y. Epub 2025 Jan 24.
4
Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review.机器学习应用于在人群层面解决非传染性疾病的偏差:一项范围综述
BMC Public Health. 2024 Dec 28;24(1):3599. doi: 10.1186/s12889-024-21081-9.
5
Factors affecting the survival of prediabetic patients: comparison of Cox proportional hazards model and random survival forest method.影响糖尿病前期患者生存的因素:Cox 比例风险模型与随机生存森林方法的比较。
BMC Med Inform Decis Mak. 2024 Sep 3;24(1):246. doi: 10.1186/s12911-024-02648-3.
6
Interrelated feature selection from health surveys using domain knowledge graph.使用领域知识图谱从健康调查中进行相关特征选择。
Health Inf Sci Syst. 2023 Nov 16;11(1):54. doi: 10.1007/s13755-023-00254-7. eCollection 2023 Dec.
7
DiabeticSense: A Non-Invasive, Multi-Sensor, IoT-Based Pre-Diagnostic System for Diabetes Detection Using Breath.糖尿病感知:一种基于物联网的非侵入式多传感器呼气糖尿病预诊断系统。
J Clin Med. 2023 Oct 10;12(20):6439. doi: 10.3390/jcm12206439.
8
Early detection system of risk factors for diabetes mellitus type 2 utilization of machine learning-random forest.2型糖尿病危险因素早期检测系统:机器学习-随机森林的应用
J Family Community Med. 2023 Jul-Sep;30(3):171-179. doi: 10.4103/jfcm.jfcm_33_23. Epub 2023 Jul 24.
9
Clinical Decision Support System for Diabetic Patients by Predicting Type 2 Diabetes Using Machine Learning Algorithms.基于机器学习算法预测 2 型糖尿病的糖尿病患者临床决策支持系统。
J Healthc Eng. 2023 May 30;2023:6992441. doi: 10.1155/2023/6992441. eCollection 2023.
10
The PBC Model: Supporting Positive Behaviours in Smart Environments.PBC 模型:智能环境中支持积极行为。
Sensors (Basel). 2022 Dec 8;22(24):9626. doi: 10.3390/s22249626.
临床计算器预测 1A 期子宫乳头状浆液性癌的化疗获益。
Gynecol Oncol. 2020 Jan;156(1):77-84. doi: 10.1016/j.ygyno.2019.10.017. Epub 2019 Nov 30.
4
Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data.仅靠影像数据还不够:基于人工智能的影像、组学和临床数据的整合。
Eur J Nucl Med Mol Imaging. 2019 Dec;46(13):2722-2730. doi: 10.1007/s00259-019-04382-9. Epub 2019 Jun 15.
5
Comparison of variable selection methods for clinical predictive modeling.比较临床预测建模中的变量选择方法。
Int J Med Inform. 2018 Aug;116:10-17. doi: 10.1016/j.ijmedinf.2018.05.006. Epub 2018 May 21.
6
International Diabetes Federation 2017.国际糖尿病联合会 2017 年。
J Diabetes. 2018 May;10(5):353-356. doi: 10.1111/1753-0407.12644. Epub 2018 Feb 13.
7
Risk of type 2 diabetes mellitus associated with plasma lipid levels: The rural Chinese cohort study.与血浆脂质水平相关的 2 型糖尿病风险:中国农村队列研究。
Diabetes Res Clin Pract. 2018 Jan;135:150-157. doi: 10.1016/j.diabres.2017.11.011. Epub 2017 Nov 15.
8
The higher prevalence of truncal obesity and diabetes in American than Chinese patients with chronic hepatitis C might contribute to more rapid progression to advanced liver disease.与中国慢性丙型肝炎患者相比,美国患者躯干肥胖和糖尿病的患病率更高,这可能导致其更快进展为晚期肝病。
Aliment Pharmacol Ther. 2017 Oct;46(8):731-740. doi: 10.1111/apt.14273. Epub 2017 Aug 22.
9
Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning.利用高维机器学习预测杰克逊心脏研究中的新发糖尿病
PLoS One. 2016 Oct 11;11(10):e0163942. doi: 10.1371/journal.pone.0163942. eCollection 2016.
10
Prevalence, awareness, treatment, and control of hypertension and associated risk factors among adults in Xi'an, China: A cross-sectional study.中国西安成年人高血压及相关危险因素的患病率、知晓率、治疗率和控制率:一项横断面研究
Medicine (Baltimore). 2016 Aug;95(34):e4709. doi: 10.1097/MD.0000000000004709.