Oliullah Khondokar, Rasel Mahedi Hasan, Islam Md Manzurul, Islam Md Reazul, Wadud Md Anwar Hussen, Whaiduzzaman Md
Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh.
School of Information Systems, Queensland University of Technology, Brisbane, Australia.
J Diabetes Metab Disord. 2023 Nov 22;23(1):603-617. doi: 10.1007/s40200-023-01321-2. eCollection 2024 Jun.
Diabetes has become a leading cause of mortality in both developed and developing countries, impacting a growing number of individuals worldwide. As the prevalence of the disease continues to rise, researchers have diligently worked towards developing accurate diabetes prediction models. The primary aim of this study is to utilize a diverse set of machine learning algorithms to detect the presence of diabetes, particularly in females, at an early stage. By leveraging these methods, this research seeks to provide physicians with valuable tools to identify the disease early, enabling timely interventions and improving patient outcomes.
In this study, some state-of-the-art machine learning techniques, such as random forest classifiers with gridsearchCV, XGBoost, NGBoost, Bagging, LightGBM, and AdaBoost classifiers, were employed. These models were chosen as the base layer of our proposed stacked ensemble model because of their high accuracy. Before feeding the data into the models, the dataset was preprocessed to ensure optimal performance and obtain improved results.
The accuracy achieved in this study was 92.91%, which demonstrates its competitiveness with the existing approaches. Moreover, the utilization of the Shapley additive explanation (SHAP) facilitated the interpretation of machine learning models.
We anticipate that these findings will be beneficial to healthcare providers, stakeholders, students, and researchers involved in diabetes prediction research and development.
糖尿病已成为发达国家和发展中国家的主要死因,影响着全球越来越多的人。随着该疾病患病率持续上升,研究人员一直在努力开发准确的糖尿病预测模型。本研究的主要目的是利用多种机器学习算法在早期阶段检测糖尿病的存在,尤其是在女性中。通过利用这些方法,本研究旨在为医生提供有价值的工具,以便早期识别疾病,从而实现及时干预并改善患者预后。
在本研究中,采用了一些最先进的机器学习技术,如带网格搜索交叉验证的随机森林分类器、XGBoost、NGBoost、装袋法、LightGBM和AdaBoost分类器。由于这些模型具有较高的准确性,因此被选为我们提出的堆叠集成模型的基础层。在将数据输入模型之前,对数据集进行了预处理,以确保最佳性能并获得更好的结果。
本研究取得的准确率为92.91%,这表明了其与现有方法的竞争力。此外,使用夏普利值加法解释(SHAP)有助于对机器学习模型进行解释。
我们预计这些发现将对参与糖尿病预测研发的医疗保健提供者、利益相关者、学生和研究人员有益。