Abousaber Inam, Abdallah Haitham F, El-Ghaish Hany
Department of Information Technology, Faculty of Computers and Information Technology, University of Tabuk, Tabuk, Saudi Arabia.
Department of Electronics and Electrical Communication, Higher Institute of Engineering and Technology, Kafr El Sheikh, Egypt.
Front Artif Intell. 2025 Jan 7;7:1499530. doi: 10.3389/frai.2024.1499530. eCollection 2024.
Diabetes prediction using clinical datasets is crucial for medical data analysis. However, class imbalances, where non-diabetic cases dominate, can significantly affect machine learning model performance, leading to biased predictions and reduced generalization.
A novel predictive framework employing cutting-edge machine learning algorithms and advanced imbalance handling techniques was developed. The framework integrates feature engineering and resampling strategies to enhance predictive accuracy.
Rigorous testing was conducted on three datasets-PIMA, Diabetes Dataset 2019, and BIT_2019-demonstrating the robustness and adaptability of the methodology across varying data environments.
The experimental results highlight the critical role of model selection and imbalance mitigation in achieving reliable and generalizable diabetes predictions. This study offers significant contributions to medical informatics by proposing a robust data-driven framework that addresses class imbalance challenges, thereby advancing diabetes prediction accuracy.
使用临床数据集进行糖尿病预测对于医学数据分析至关重要。然而,非糖尿病病例占主导的类不平衡情况会显著影响机器学习模型的性能,导致预测偏差和泛化能力下降。
开发了一种采用前沿机器学习算法和先进不平衡处理技术的新型预测框架。该框架集成了特征工程和重采样策略以提高预测准确性。
对三个数据集——皮马印第安人糖尿病数据集、2019年糖尿病数据集和2019年BIT数据集——进行了严格测试,证明了该方法在不同数据环境下的稳健性和适应性。
实验结果突出了模型选择和不平衡缓解在实现可靠且可泛化的糖尿病预测中的关键作用。本研究通过提出一个强大的数据驱动框架来应对类不平衡挑战,从而提高糖尿病预测准确性,为医学信息学做出了重大贡献。