Suppr超能文献

基于随机森林分类器的糖尿病分类探索性研究。

Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier.

机构信息

Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China.

Shanxi Centre for Disease Control and Prevention, Taiyuan, 030012, Shanxi, China.

出版信息

BMC Med Inform Decis Mak. 2021 Mar 20;21(1):105. doi: 10.1186/s12911-021-01471-4.

Abstract

BACKGROUND

Diabetes Mellitus (DM) has become the third chronic non-communicable disease that hits patients after tumors, cardiovascular and cerebrovascular diseases, and has become one of the major public health problems in the world. Therefore, it is of great importance to identify individuals at high risk for DM in order to establish prevention strategies for DM.

METHODS

Aiming at the problem of high-dimensional feature space and high feature redundancy of medical data, as well as the problem of data imbalance often faced. This study explored different supervised classifiers, combined with SVM-SMOTE and two feature dimensionality reduction methods (Logistic stepwise regression and LAASO) to classify the diabetes survey sample data with unbalanced categories and complex related factors. Analysis and discussion of the classification results of 4 supervised classifiers based on 4 data processing methods. Five indicators including Accuracy, Precision, Recall, F1-Score and AUC are selected as the key indicators to evaluate the performance of the classification model.

RESULTS

According to the result, Random Forest Classifier combining SVM-SMOTE resampling technology and LASSO feature screening method (Accuracy = 0.890, Precision = 0.869, Recall = 0.919, F1-Score = 0.893, AUC = 0.948) proved the best way to tell those at high risk of DM. Besides, the combined algorithm helps enhance the classification performance for prediction of high-risk people of DM. Also, age, region, heart rate, hypertension, hyperlipidemia and BMI are the top six most critical characteristic variables affecting diabetes.

CONCLUSIONS

The Random Forest Classifier combining with SVM-SMOTE and LASSO feature reduction method perform best in identifying high-risk people of DM from individuals. And the combined method proposed in the study would be a good tool for early screening of DM.

摘要

背景

糖尿病(DM)已成为继肿瘤、心脑血管疾病之后危害患者的第三大慢性非传染性疾病,成为全球主要公共卫生问题之一。因此,识别 DM 高危个体,建立 DM 预防策略具有重要意义。

方法

针对医学数据高维特征空间和高特征冗余,以及常面临的数据不平衡问题。本研究探索了不同的有监督分类器,结合 SVM-SMOTE 和两种特征降维方法(Logistic 逐步回归和 LAASO),对类别不平衡且相关因素复杂的糖尿病调查样本数据进行分类。分析和讨论了基于 4 种数据处理方法的 4 种有监督分类器的分类结果。选择准确率、精确率、召回率、F1-Score 和 AUC 五个指标作为评价分类模型性能的关键指标。

结果

结果表明,结合 SVM-SMOTE 重采样技术和 LASSO 特征筛选方法的随机森林分类器(准确率=0.890、精确率=0.869、召回率=0.919、F1-Score=0.893、AUC=0.948)证明了识别 DM 高危人群的最佳方法。此外,联合算法有助于提高预测 DM 高危人群的分类性能。此外,年龄、地区、心率、高血压、高血脂和 BMI 是影响糖尿病的前六个最重要的特征变量。

结论

随机森林分类器结合 SVM-SMOTE 和 LASSO 特征降维方法在识别 DM 高危人群方面表现最佳。并且研究中提出的联合方法将成为 DM 早期筛查的良好工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/218d/7980612/b3ba82f0d81c/12911_2021_1471_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验