Jian Yazan, Pasquier Michel, Sagahyroon Assim, Aloul Fadi
Department of Computer Science and Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates.
Healthcare (Basel). 2021 Dec 9;9(12):1712. doi: 10.3390/healthcare9121712.
Diabetes mellitus (DM) is a chronic disease that is considered to be life-threatening. It can affect any part of the body over time, resulting in serious complications such as nephropathy, neuropathy, and retinopathy. In this work, several supervised classification algorithms were applied for building different models to predict and classify eight diabetes complications. The complications include metabolic syndrome, dyslipidemia, neuropathy, nephropathy, diabetic foot, hypertension, obesity, and retinopathy. For this study, a dataset collected by the Rashid Center for Diabetes and Research (RCDR) located in Ajman, UAE, was utilized. The dataset consists of 884 records with 79 features. Some essential preprocessing steps were applied to handle the missing values and unbalanced data problems. Furthermore, feature selection was performed to select the top five and ten features for each complication. The final number of records used to train and build the binary classifiers for each complication was as follows: 428-metabolic syndrome, 836-dyslipidemia, 223-neuropathy, 233-nephropathy, 240-diabetic foot, 586-hypertension, 498-obesity, 228-retinopathy. Repeated stratified k-fold cross-validation (with k = 10 and a total of 10 repetitions) was employed for a better estimation of the performance. Accuracy and F1-score were used to evaluate the models' performance reaching a maximum of 97.8% and 97.7% for accuracy and F1-scores, respectively. Moreover, by comparing the performance achieved using different attributes' sets, it was found that by using a selected number of features, we can still build adequate classifiers.
糖尿病(DM)是一种被认为会危及生命的慢性疾病。随着时间的推移,它会影响身体的任何部位,导致诸如肾病、神经病变和视网膜病变等严重并发症。在这项工作中,应用了几种监督分类算法来构建不同的模型,以预测和分类八种糖尿病并发症。这些并发症包括代谢综合征、血脂异常、神经病变、肾病、糖尿病足、高血压、肥胖症和视网膜病变。对于本研究,使用了位于阿联酋阿治曼的拉希德糖尿病与研究中心(RCDR)收集的数据集。该数据集由884条记录和79个特征组成。应用了一些基本的预处理步骤来处理缺失值和数据不平衡问题。此外,还进行了特征选择,为每种并发症选择前五个和前十个特征。用于训练和构建每种并发症的二元分类器的最终记录数量如下:代谢综合征428条、血脂异常836条、神经病变223条、肾病233条、糖尿病足240条、高血压586条、肥胖症498条、视网膜病变228条。为了更好地评估性能,采用了重复分层k折交叉验证(k = 10,共重复10次)。使用准确率和F1分数来评估模型性能,准确率和F1分数分别最高达到97.8%和97.7%。此外,通过比较使用不同属性集所取得的性能发现,通过使用选定数量的特征,我们仍然可以构建出足够的分类器。