Al-Hussein Faten, Abdollahian Mali, Tafakori Laleh, Al-Shali Khalid
School of Science, RMIT University, Melbourne, Victoria, Australia.
Department of Mathematics and Statistics, College of Sciences, University of Jeddah, Jeddah, Saudi Arabia.
PLoS One. 2025 Jun 17;20(6):e0326315. doi: 10.1371/journal.pone.0326315. eCollection 2025.
Type 2 diabetes (T2D) is considered a significant global health concern. Hemoglobin A1c level (HbA1c) is recognized as the most reliable indicator for its diagnosis. Genetic, family, environmental, and health behaviors are the factors associated with the disease. T2D is linked to substantial economic costs and human suffering, making it a primary concern for health planners, physicians, and those living with the disease. Saudi Arabia currently ranks seventh worldwide in terms of prevalence rate. Despite this high rate, the country lacks focused research on T2D. This study aims to develop hybrid prediction models that integrate the strengths of multiple algorithms to enhance HbA1c prediction accuracy while minimising the number of significant Key Performance Indicators (KPIs). The proposed model can help healthcare practitioners diagnose T2D at an early stage. Analyses were conducted in a case-control study in Saudi Arabia involving cases (patients with HbA1c levels ≥ 6.5) and controls with normal HbA1c levels (< 6.5). Medical records from 3,000 King Abdulaziz University Hospital patients containing demographic, lifestyle, and lipid profile data were used to develop the models. For the first time, we utilized recommended machine learning algorithms to develop hybrid prediction models to reduce the number of significant KPIs while enhancing HbA1c prediction accuracy. The hybrid model combining Random Forest (RF) and Logistic Regression (LR) with only 4 out of 10 KPIs outperformed other models with an accuracy of 0.93, precision of 0.95, recall of 0.90, F-score of 0.92, an AUC of 0.88, and Gini index of 0.76. The significant variables identified by the model through backward elimination are age, body mass index (BMI), triglycerides (TG), and high-density lipoprotein (HDL). The proposed model helps healthcare providers identify patients at risk of T2D by monitoring fewer key predictors of HbA1c levels, enhancing early intervention strategies for managing diabetes in Saudi Arabia.
2型糖尿病(T2D)被视为全球重大的健康问题。糖化血红蛋白水平(HbA1c)被公认为其诊断的最可靠指标。遗传、家族、环境和健康行为是与该疾病相关的因素。T2D与巨大的经济成本和人类痛苦相关联,这使其成为卫生规划者、医生以及糖尿病患者的主要关切。沙特阿拉伯目前在患病率方面位居世界第七。尽管患病率很高,但该国缺乏对T2D的针对性研究。本研究旨在开发混合预测模型,该模型整合多种算法的优势,以提高HbA1c预测准确性,同时尽量减少关键绩效指标(KPI)的数量。所提出的模型可帮助医疗从业者在早期诊断T2D。在沙特阿拉伯的一项病例对照研究中进行了分析,该研究涉及病例组(HbA1c水平≥6.5的患者)和HbA1c水平正常(<6.5)的对照组。利用阿卜杜勒阿齐兹国王大学医院3000名患者的病历,其中包含人口统计学、生活方式和血脂谱数据来开发模型。我们首次利用推荐的机器学习算法开发混合预测模型,以减少重要KPI的数量,同时提高HbA1c预测准确性。将随机森林(RF)和逻辑回归(LR)相结合的混合模型,仅用10个KPI中的4个,其表现优于其他模型,准确率为0.93,精确率为0.95,召回率为0.90,F值为0.92,曲线下面积(AUC)为0.88,基尼指数为0.76。该模型通过向后消除法确定的显著变量为年龄、体重指数(BMI)、甘油三酯(TG)和高密度脂蛋白(HDL)。所提出的模型通过监测较少的HbA1c水平关键预测指标,帮助医疗服务提供者识别有T2D风险的患者,加强沙特阿拉伯糖尿病管理的早期干预策略。