Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh; Statistics Discipline, Khulna University, Khulna, 9208, Bangladesh.
Diabetes Metab Syndr. 2020 May-Jun;14(3):217-219. doi: 10.1016/j.dsx.2020.03.004. Epub 2020 Mar 10.
Diabetes has been recognized as a continuing health challenge for the twenty-first century, both in developed and developing countries including Bangladesh. The main objective of this study is to use machine learning (ML) based classifiers for automated detection and classification of diabetes.
The diabetes dataset have taken from Bangladesh demographic and health survey, 2011 data having 1569 respondents are 127 diabetes. Two statistical tests as independent t for continuous and chi-square for categorical variables are used to determine the risk factors of diabetes. Six ML-based classifiers as support vector machine, random forest, linear discriminant analysis, logistic regression, k-nearest neighborhood, bagged classification and regression tree (Bagged CART) have been adopted to predict and classify of diabetes.
Our findings show that 11 factors out of 15 factors are significantly associated with diabetes. Bagged CART provides the highest accuracy and area under the curve of 94.3% and 0.600.
Bagged CART anticipates a very supportive computational resource for classification of diabetes and it would be very helpful to the doctors for making a decision to control diabetes disease in Bangladesh.
糖尿病已被公认为二十一世纪发达国家和发展中国家(包括孟加拉国)面临的持续健康挑战。本研究的主要目的是使用基于机器学习(ML)的分类器来自动检测和分类糖尿病。
该糖尿病数据集取自孟加拉国人口与健康调查,2011 年的数据共有 1569 名受访者,其中 127 人患有糖尿病。使用独立 t 检验(用于连续变量)和卡方检验(用于分类变量)两种统计检验方法来确定糖尿病的风险因素。采用了六种基于 ML 的分类器,包括支持向量机、随机森林、线性判别分析、逻辑回归、k-最近邻、袋装分类和回归树(Bagged CART),以预测和分类糖尿病。
我们的研究结果表明,15 个因素中有 11 个与糖尿病显著相关。Bagged CART 提供了最高的准确性和曲线下面积,分别为 94.3%和 0.600。
Bagged CART 为糖尿病的分类提供了非常有支持性的计算资源,这将对孟加拉国的医生控制糖尿病疾病的决策非常有帮助。