Department of General, Visceral, Cancer and Transplant Surgery, Faculty of Medicine and University Hospital of Cologne, Kerpener Straße 62, 50937, Cologne, Germany.
Data Science of Bioimages Lab, Center for Molecular Medicine Cologne (CMMC), Faculty of Medicine and University Hospital of Cologne, University of Cologne, Robert-Koch-Straße 21, 50937, Cologne, Germany.
J Nephrol. 2024 Jul;37(6):1631-1642. doi: 10.1007/s40620-024-01967-y. Epub 2024 Jun 5.
Living kidney donors are screened pre-donation to estimate the risk of end-stage kidney disease (ESKD). We evaluate Machine Learning (ML) to predict the progression of kidney function deterioration over time using the estimated GFR (eGFR) slope as the target variable.
We included 238 living kidney donors who underwent donor nephrectomy. We divided the dataset based on the eGFR slope in the third follow-up year, resulting in 185 donors with an average eGFR slope and 53 donors with an accelerated declining eGFR-slope. We trained three Machine Learning-models (Random Forest [RF], Extreme Gradient Boosting [XG], Support Vector Machine [SVM]) and Logistic Regression (LR) for predictions. Predefined data subsets served for training to explore whether parameters of an ESKD risk score alone suffice or additional clinical and time-zero biopsy parameters enhance predictions. Machine learning-driven feature selection identified the best predictive parameters.
None of the four models classified the eGFR slope with an AUC greater than 0.6 or an F1 score surpassing 0.41 despite training on different data subsets. Following machine learning-driven feature selection and subsequent retraining on these selected features, random forest and extreme gradient boosting outperformed other models, achieving an AUC of 0.66 and an F1 score of 0.44. After feature selection, two predictive donor attributes consistently appeared in all models: smoking-related features and glomerulitis of the Banff Lesion Score.
Training machine learning-models with distinct predefined data subsets yielded unsatisfactory results. However, the efficacy of random forest and extreme gradient boosting improved when trained exclusively with machine learning-driven selected features, suggesting that the quality, rather than the quantity, of features is crucial for machine learning-model performance. This study offers insights into the application of emerging machine learning-techniques for the screening of living kidney donors.
活体肾捐献者在捐赠前接受筛查,以评估终末期肾病(ESKD)的风险。我们评估了机器学习(ML),使用估算的肾小球滤过率(eGFR)斜率作为目标变量来预测肾功能恶化的进展。
我们纳入了 238 名接受供体肾切除术的活体肾捐献者。我们根据第三次随访年度的 eGFR 斜率对数据集进行了划分,结果 185 名供体的平均 eGFR 斜率和 53 名供体的 eGFR 斜率加速下降。我们训练了三个机器学习模型(随机森林[RF]、极端梯度提升[XG]、支持向量机[SVM])和逻辑回归(LR)进行预测。预定义的数据子集用于训练,以探讨 ESKD 风险评分的参数是否足以进行预测,或者是否需要额外的临床和时间零活检参数来提高预测效果。机器学习驱动的特征选择确定了最佳预测参数。
尽管在不同的数据子集中进行了训练,但这四个模型均无法将 eGFR 斜率的 AUC 分类大于 0.6 或 F1 评分超过 0.41。在进行机器学习驱动的特征选择和随后对这些选定特征进行重新训练后,随机森林和极端梯度提升的表现优于其他模型,AUC 达到 0.66,F1 评分达到 0.44。经过特征选择后,两个有预测性的供者属性始终出现在所有模型中:与吸烟相关的特征和 Banff 病变评分的肾小球肾炎。
使用不同的预定义数据子集训练机器学习模型的效果并不理想。然而,当仅使用机器学习驱动的选定特征进行训练时,随机森林和极端梯度提升的效果有所提高,这表明特征的质量而非数量对机器学习模型的性能至关重要。本研究为新兴机器学习技术在活体肾捐献者筛查中的应用提供了思路。