Shams Mahmoud Y, Tarek Zahraa, Elshewey Ahmed M
Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
Faculty of Computers and Information, Computer Science Department, Mansoura University, Mansoura, 35561, Egypt.
Sci Rep. 2025 Jan 6;15(1):982. doi: 10.1038/s41598-024-82420-9.
Diabetes is a long-term condition characterized by elevated blood sugar levels. It can lead to a variety of complex disorders such as stroke, renal failure, and heart attack. Diabetes requires the most machine learning help to diagnose diabetes illness at an early stage, as it cannot be treated and adds significant complications to our health-care system. The diabetes PIMA Indian dataset (PIDD) was used for classification in several studies, it includes 768 instances and 9 features; eight of the features are the predictors, and one feature is the target. Firstly, we performed the preprocessing stage that includes mean imputation and data normalization. Afterwards, we trained the extracted features using various types of Machine Learning (ML); Random Forest (RF), Logistic Regression (LR), K-Nearest neighbor (KNN), Naïve Bayes (NB), Histogram Gradient Boost (HGB), and Gated Recurrent Unit (GRU) models. To achieve the classification for the PIDD, a new model called Recursive Feature Elimination-GRU (RFE-GRU) is proposed in this paper. RFE is vital for selecting features in the training dataset that are most important in predicting the target variable. While the GRU handles the challenge of vanishing and inflating gradient of the features results from RFE. Several predictive evaluation metrics, including precision, recall, F1-score, accuracy, and Area Under the Curve (AUC) achieved 90.50%, 90.70%, 90.50%, 90.70%, 0.9278, respectively, to verify and validate the execution of the RFE-GRU model. The comparative results showed that the RFE-GRU model is better than other classification models.
糖尿病是一种以血糖水平升高为特征的长期病症。它会导致多种复杂疾病,如中风、肾衰竭和心脏病发作。糖尿病在早期诊断时最需要机器学习的帮助,因为它无法治愈,还会给我们的医疗保健系统带来重大并发症。糖尿病皮马印第安人数据集(PIDD)在多项研究中被用于分类,它包含768个实例和9个特征;其中8个特征是预测变量,1个特征是目标变量。首先,我们进行了预处理阶段,包括均值插补和数据归一化。之后,我们使用各种类型的机器学习(ML)对提取的特征进行训练;随机森林(RF)、逻辑回归(LR)、K近邻(KNN)、朴素贝叶斯(NB)、直方图梯度提升(HGB)和门控循环单元(GRU)模型。为了实现对PIDD的分类,本文提出了一种名为递归特征消除 - GRU(RFE - GRU)的新模型。RFE对于在训练数据集中选择对预测目标变量最重要的特征至关重要。而GRU则应对了RFE产生的特征梯度消失和梯度膨胀的挑战。几个预测评估指标,包括精确率、召回率、F1分数、准确率和曲线下面积(AUC)分别达到了90.50%、90.70%、90.50%、90.70%、0.9278,以验证和确认RFE - GRU模型的执行情况。比较结果表明,RFE - GRU模型优于其他分类模型。