Strategic Center for Diabetes Research, College of Medicine, King Saud University, Riyadh, Saudi Arabia.
J Healthc Eng. 2022 Apr 1;2022:7378307. doi: 10.1155/2022/7378307. eCollection 2022.
Diabetic kidney disease (DKD), one of the complications of diabetes in patients, leads to progressive loss of kidney function. Timely intervention is known to improve outcomes. Therefore, screening patients to identify high-risk populations is important. Machine learning classification techniques can be applied to patient datasets to identify high-risk patients by building a predictive model.
This study aims to identify a suitable classification technique for predicting DKD by applying different classification techniques to a DKD dataset and comparing their performance using WEKA machine learning software.
The performance of nine different classification techniques was analyzed on a DKD dataset with 410 instances and 18 attributes. Data preprocessing was carried out using the PartitionMembershipFilter. A 10-fold cross validation was performed on the dataset. The performance was assessed on the basis of the execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error, and true values of the confusion matrix.
With an accuracy of 93.6585% and a higher K value (0.8731), IBK and random tree classification techniques were found to be the best performing techniques. Moreover, they also exhibited the lowest root mean squared error rate (0.2496). There were 15 false-positive instances and 11 false-negative instances with these prediction models.
This study identified IBK and random tree classification techniques as the best performing classifiers and accurate prediction methods for DKD.
糖尿病肾病(DKD)是糖尿病患者的一种并发症,可导致肾功能逐渐丧失。及时干预可改善预后。因此,筛查患者以识别高危人群很重要。机器学习分类技术可应用于患者数据集,通过构建预测模型来识别高危患者。
本研究旨在通过应用不同的分类技术对 DKD 数据集进行分析,并使用 WEKA 机器学习软件比较它们的性能,从而确定适合预测 DKD 的分类技术。
对一个包含 410 个实例和 18 个属性的 DKD 数据集,使用 PartitionMembershipFilter 进行数据预处理,采用 10 倍交叉验证方法。根据执行时间、准确性、正确和错误分类的实例数、kappa 统计量(K)、平均绝对误差、均方根误差和混淆矩阵的真实值来评估性能。
IBK 和随机树分类技术的准确率为 93.6585%,K 值更高(0.8731),被认为是性能最佳的技术。此外,它们的均方根误差率也最低(0.2496)。这些预测模型有 15 个假阳性实例和 11 个假阴性实例。
本研究确定 IBK 和随机树分类技术是 DKD 最佳的分类器和准确的预测方法。