Suppr超能文献

使用机器学习多分类器集成模型预测糖尿病疾病。

Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.

机构信息

Department of Statistics, Science and Research Branch, Islamic Azad University, Tehran, Iran.

School of Mathematics, Iran University of Science and Technology, Tehran, Iran.

出版信息

BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z.

Abstract

BACKGROUND AND OBJECTIVE

Diabetes is a life-threatening chronic disease with a growing global prevalence, necessitating early diagnosis and treatment to prevent severe complications. Machine learning has emerged as a promising approach for diabetes diagnosis, but challenges such as limited labeled data, frequent missing values, and dataset imbalance hinder the development of accurate prediction models. Therefore, a novel framework is required to address these challenges and improve performance.

METHODS

In this study, we propose an innovative pipeline-based multi-classification framework to predict diabetes in three classes: diabetic, non-diabetic, and prediabetes, using the imbalanced Iraqi Patient Dataset of Diabetes. Our framework incorporates various pre-processing techniques, including duplicate sample removal, attribute conversion, missing value imputation, data normalization and standardization, feature selection, and k-fold cross-validation. Furthermore, we implement multiple machine learning models, such as k-NN, SVM, DT, RF, AdaBoost, and GNB, and introduce a weighted ensemble approach based on the Area Under the Receiver Operating Characteristic Curve (AUC) to address dataset imbalance. Performance optimization is achieved through grid search and Bayesian optimization for hyper-parameter tuning.

RESULTS

Our proposed model outperforms other machine learning models, including k-NN, SVM, DT, RF, AdaBoost, and GNB, in predicting diabetes. The model achieves high average accuracy, precision, recall, F1-score, and AUC values of 0.9887, 0.9861, 0.9792, 0.9851, and 0.999, respectively.

CONCLUSION

Our pipeline-based multi-classification framework demonstrates promising results in accurately predicting diabetes using an imbalanced dataset of Iraqi diabetic patients. The proposed framework addresses the challenges associated with limited labeled data, missing values, and dataset imbalance, leading to improved prediction performance. This study highlights the potential of machine learning techniques in diabetes diagnosis and management, and the proposed framework can serve as a valuable tool for accurate prediction and improved patient care. Further research can build upon our work to refine and optimize the framework and explore its applicability in diverse datasets and populations.

摘要

背景与目的

糖尿病是一种具有全球发病率不断上升趋势的危及生命的慢性疾病,需要早期诊断和治疗,以预防严重并发症。机器学习已成为诊断糖尿病的一种很有前途的方法,但面临着标签数据有限、频繁缺失值和数据集不平衡等挑战,这些挑战阻碍了精确预测模型的发展。因此,需要一种新的框架来解决这些挑战并提高性能。

方法

本研究提出了一种基于管道的多分类框架,用于使用不平衡的伊拉克糖尿病患者数据集预测糖尿病的三个类别:糖尿病、非糖尿病和糖尿病前期。我们的框架包含各种预处理技术,包括重复样本删除、属性转换、缺失值插补、数据归一化和标准化、特征选择和 k 折交叉验证。此外,我们实现了多种机器学习模型,如 k-NN、SVM、DT、RF、AdaBoost 和 GNB,并引入了基于接收者操作特征曲线(AUC)下面积的加权集成方法来解决数据集不平衡问题。通过网格搜索和贝叶斯优化进行超参数调整,实现了性能优化。

结果

我们提出的模型在预测糖尿病方面优于其他机器学习模型,包括 k-NN、SVM、DT、RF、AdaBoost 和 GNB。该模型实现了高的平均准确性、精度、召回率、F1 得分和 AUC 值,分别为 0.9887、0.9861、0.9792、0.9851 和 0.999。

结论

我们基于管道的多分类框架在使用伊拉克糖尿病患者的不平衡数据集准确预测糖尿病方面取得了有希望的结果。所提出的框架解决了标签数据有限、缺失值和数据集不平衡相关的挑战,从而提高了预测性能。本研究强调了机器学习技术在糖尿病诊断和管理中的潜力,所提出的框架可以作为一种准确预测和改善患者护理的有价值工具。进一步的研究可以在我们的工作基础上进行,以完善和优化框架,并探索其在不同数据集和人群中的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/39aa/10496262/58dd69d45acb/12859_2023_5465_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验