Department of Public Health, College of Medical and Health Science, Samara University, Samara, Ethiopia.
College of Veterinary Medicine, Samara University, Samara, Ethiopia.
Sci Rep. 2023 May 13;13(1):7779. doi: 10.1038/s41598-023-34906-1.
Ethiopia has been challenged by the growing magnitude of diabetes in general and type-2 diabetes in particular. Knowledge extraction from stored dataset can be an important base for better decision on diabetes rapid diagnosis, suggestive on prediction for early intervention. Thus, this study was addressed these problem by application of supervised machine learning algorithms for classification and prediction of type 2 diabetes disease status and might provide context-specific information to program planners and policy makers so that, priority will be given to the more affected groups. To apply supervised machine learning algorithms; compare these algorithms and select the best algorithm based on their performance for classification and prediction of type-2 diabetic disease status (positive or negative) in public hospitals of Afar regional state, Northeastern Ethiopia. This study was conducted at Afar regional state from February to June of 2021. Decision tree; pruned J 48, Artificial neural network, K-nearest neighbor, Support vector machine, Binary logistic regression, Random forest, and Naïve Bayes supervised machine learning algorithms were applied using secondary data from the medical database record review. A total of 2239 sample Dataset diagnosed for diabetes from 2012 to April 22/2020 (1523 with type-2 diabetes and 716 without type-2 diabetes) was checked for its completeness prior to analysis. For all algorithms, WEKA3.7 tool was used for analysis purposes. Moreover, all algorithms were compared based on their correctly classification rate, kappa statistics, confusion matrix, area under the curve, sensitivity, and specificity. From the seven major supervised machine learning algorithms, the best classification and prediction results were obtained from random forest [correctly classified rate (93.8%), kappa statistics (0.85), sensitivity (0.98), area under the curve (0.97) and confusion matrix (out of 454 actual positive prediction for 446)] which was followed by decision tree pruned J 48 [correctly classified rate (91.8%), kappa statistics (0.80), sensitivity (0.96), area under the curve (0.91) and confusion matrices (out of 454 actual positive prediction for 438)] and k-nearest neighbor [correctly classified rate (89.8%), kappa statistics (0.76), sensitivity (0.92), area under the curve (0.88) and confusion matrices (out of 454 actual positive prediction for 421)]. Random forest, Decision tree pruned J48 and k-nearest neighbor algorithms have better classification and prediction performance for classifying and predicting type-2 diabetes disease status. Therefore, based on this performance, random forest algorithm can be judged as suggestive and supportive for clinicians at the time of type-2 diabetes diagnosis.
埃塞俄比亚一直受到糖尿病发病率不断上升的挑战,尤其是 2 型糖尿病。从存储的数据集中提取知识可以为快速诊断糖尿病、早期干预提供预测提供重要基础。因此,本研究通过应用监督机器学习算法对 2 型糖尿病疾病状态进行分类和预测,旨在解决这些问题,并可能为规划者和决策者提供特定背景的信息,以便优先考虑受影响更大的群体。为了应用监督机器学习算法,我们比较了这些算法,并根据它们在公共卫生机构中对 2 型糖尿病疾病状态(阳性或阴性)的分类和预测性能选择最佳算法。本研究于 2021 年 2 月至 6 月在阿法尔州进行。决策树;修剪后的 J48、人工神经网络、K-最近邻、支持向量机、二项逻辑回归、随机森林和朴素贝叶斯监督机器学习算法应用于从 2012 年至 2020 年 4 月 22 日(1523 例 2 型糖尿病,716 例无 2 型糖尿病)的医疗数据库记录审查中获得的二次数据。在进行分析之前,对所有算法检查了 2239 个样本数据集,以检查其完整性。对于所有算法,使用 WEKA3.7 工具进行分析。此外,还根据正确分类率、kappa 统计量、混淆矩阵、曲线下面积、敏感性和特异性对所有算法进行了比较。在这七种主要的监督机器学习算法中,随机森林的分类和预测结果最好[正确分类率(93.8%)、kappa 统计量(0.85)、敏感性(0.98)、曲线下面积(0.97)和混淆矩阵(454 个实际阳性预测中有 446 个)],其次是修剪后的决策树 J48[正确分类率(91.8%)、kappa 统计量(0.80)、敏感性(0.96)、曲线下面积(0.91)和混淆矩阵(454 个实际阳性预测中有 438 个)]和 K-最近邻[正确分类率(89.8%)、kappa 统计量(0.76)、敏感性(0.92)、曲线下面积(0.88)和混淆矩阵(454 个实际阳性预测中有 421 个)]。随机森林、修剪后的决策树 J48 和 K-最近邻算法在 2 型糖尿病疾病状态分类和预测方面具有更好的分类和预测性能。因此,基于这些性能,随机森林算法可以被判断为在 2 型糖尿病诊断时对临床医生具有提示性和支持性。