Department of Computer Systems and Computing, School of Computer Science, Complutense University of Madrid, 28040, Madrid, Spain.
Department of Computer Systems and Computing, School of Computer Science, Complutense University of Madrid, 28040, Madrid, Spain.
Comput Biol Med. 2024 Sep;180:108982. doi: 10.1016/j.compbiomed.2024.108982. Epub 2024 Aug 6.
Kidney transplant recipients face a high cardiovascular risk, which is a leading cause of death in this patient group. This article proposes the application of clustering techniques and feature selection to predict the survival outcomes of kidney transplant recipients based on machine learning techniques and mainstream statistical methods. First, feature selection techniques (Boruta, Random Survival Forest and Elastic Net) are used to detect the most relevant variables. Subsequently, each set of variables obtained by each feature selection technique is used as input for the clustering algorithms used (Consensus Clustering, Self-Organizing Map and Agglomerative Clustering) to determine which combination of feature selection, clustering algorithm and number of clusters maximizes intercluster variability. Next, the mechanism called False Clustering Discovery Reduction is applied to obtain the minimum number of statistically differentiable populations after applying a control metric. This metric is based on a variance test to confirm that reducing the number of clusters does not generate significant losses in the heterogeneity obtained. This approach was applied to the Organ Procurement and Transplantation Network medical dataset (n = 11,332). The combination of Random Survival Forest and consensus clustering yielded the optimal result of 4 clusters starting from 8 initial ones. Finally, for each population, Kaplan-Meier survival curves are generated to predict the survival of new patients based on the predictions of the XGBoost classifier, with an overall multi-class AUC of 98.11%.
肾移植受者面临着较高的心血管风险,这是该患者群体死亡的主要原因。本文提出了应用聚类技术和特征选择,基于机器学习技术和主流统计方法,预测肾移植受者的生存结果。首先,使用特征选择技术(Boruta、随机生存森林和弹性网络)检测最相关的变量。随后,将每种特征选择技术获得的变量集用作聚类算法(共识聚类、自组织映射和凝聚聚类)的输入,以确定哪种特征选择、聚类算法和聚类数组合可以最大程度地提高聚类间的可变性。接下来,应用名为 False Clustering Discovery Reduction 的机制,在应用控制指标后,获得最小数量的可区分统计群体。该指标基于方差检验,以确认减少聚类数量不会导致获得的异质性显著损失。该方法应用于 Organ Procurement and Transplantation Network 医疗数据集(n=11332)。从 8 个初始聚类开始,随机生存森林和共识聚类的组合产生了 4 个聚类的最佳结果。最后,为每个群体生成 Kaplan-Meier 生存曲线,根据 XGBoost 分类器的预测,预测新患者的生存情况,整体多类 AUC 为 98.11%。