Department of Computer Science Engineering, Bennett University, TechZone II, Greater Noida, India.
Pitney Bowes Software, Noida, India.
Comput Methods Biomech Biomed Engin. 2022 Jun;25(8):887-895. doi: 10.1080/10255842.2021.1985476. Epub 2021 Nov 2.
Chronic kidney disease (CKD) is one of the serious health concerns in the twenty-first century. CKD impacts over 37 million Americans. By applying machine learning (ML) techniques to clinical data, CKD can be diagnosed early. This early detection of CKD can prevent numerous loss of life. In this work, clinical data set of 400 patients, available on the UCI repository, are taken. Unfortunately, this data set doesn't have an equal distribution of CKD and Non-CKD samples. This imbalanced nature of data highly influences the learning capabilities of classifiers. Genetic Programming (GP) is an ML technique based on the evolution of species. GP with standard fitness function, also impacted by this imbalanced nature of data. A new Euclidean distance-based fitness function in GP is proposed to handle this imbalanced nature of the data set. To compare the robustness of the proposed work, other classification techniques, K-nearest neighborhood (KNN), KNN with particle swarm optimization (PSO), and GP with the standard fitness function, is also applied. For ten-fold cross-validation, the KNN shows an accuracy of 83.54% with an AUC value of 0.69, the PSO-KNN shows an accuracy of 96.79% with an AUC value of 0.94, and the GP, with the newly proposed fitness function, supersedes KNN and PSO-KNN and shows the accuracy of 99.33% with an AUC value of 0.99.
慢性肾脏病(CKD)是 21 世纪严重的健康问题之一。CKD 影响了超过 3700 万美国人。通过将机器学习(ML)技术应用于临床数据,可以早期诊断 CKD。这种对 CKD 的早期发现可以防止许多生命的丧失。在这项工作中,采用了 UCI 存储库中提供的 400 名患者的临床数据集。不幸的是,这个数据集没有 CKD 和非 CKD 样本的均衡分布。这种数据的不平衡性质极大地影响了分类器的学习能力。遗传编程(GP)是一种基于物种进化的机器学习技术。使用标准适应度函数的 GP 也受到了数据不平衡性质的影响。在 GP 中提出了一种新的基于欧几里得距离的适应度函数来处理数据集的不平衡性质。为了比较所提出工作的稳健性,还应用了其他分类技术,如 K-最近邻(KNN)、带有粒子群优化(PSO)的 KNN 和使用标准适应度函数的 GP。对于十折交叉验证,KNN 的准确率为 83.54%,AUC 值为 0.69,PSO-KNN 的准确率为 96.79%,AUC 值为 0.94,而使用新提出的适应度函数的 GP 则超过了 KNN 和 PSO-KNN,准确率为 99.33%,AUC 值为 0.99。