Zhou Bingkun, Zhou Hu, Huang Xiaodong, Liu Shijie
Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, China.
Department of Medicine, Nephrology Division, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.
Front Cell Dev Biol. 2025 Aug 15;13:1627355. doi: 10.3389/fcell.2025.1627355. eCollection 2025.
Early diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient's quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognostic prediction models based on transcriptomics and machine learning.
Utilizing weighted correlation network analysis (WGCNA) and random forest algorithms in GSE137570, three core gene sets of different sizes were constructed, which were externally validated in GSE66494 and GSE180394, and evaluated for their predictive performance in GSE45980 by receiver operating characteristic (ROC) curves. Predictive models were built using Cox regression, LASSO regression, and logistic regression in GSE60861. And the reliability of human CKD transcriptomic analysis and the feasibility of functional studies were validated in a mouse UUO model.
Combining WGCNA and differential gene analysis, 9 genes positively associated with CKD occurrence and development and 20 genes negatively associated with that were identified. By random forest algorithm, three gene sets were constructed: minimal gene set (, , ), medium gene set (, , , , , , , , ), and maximal gene set (, , , , , , , , , ). In external validation, the maximal plage score had best classification performance for CKD (AUC:0.767) in GSE66494 and in GSE180394 (AUC:0.760), the medium plage score achieved a predictive performance for CKD progression (AUC = 0.758) in GSE45980. In the multivariate model, Cox regression analysis constructed a risk model with only minimal z-score, further LASSO regression analysis included gender and minimal z-score, but logistic regression multivariate analysis failed to be constructed with any score. A high degree of similarity between mouse CKD and human CKD in KEGG enrichment was observed in the mouse unilateral ureteral obstruction model, and the core genes related to the occurrence and progression of human CKD remained diagnostically valuable in mice.
This study provides a transcriptomics-based risk prediction model for the occurrence and development of CKD based on machine learning, offering potential target genes for the further experimental research of CKD.
慢性肾脏病(CKD)的早期诊断和干预可显著改善患者生活质量及预后。除常规实验室指标和病史外,风险预测模型可预测CKD结局。然而,目前缺乏基于转录组学和机器学习的CKD预后预测模型。
在GSE137570中利用加权基因共表达网络分析(WGCNA)和随机森林算法构建了3个不同大小的核心基因集,在GSE66494和GSE180394中进行外部验证,并通过受试者工作特征(ROC)曲线在GSE45980中评估其预测性能。在GSE60861中使用Cox回归、LASSO回归和逻辑回归构建预测模型。并在小鼠单侧输尿管梗阻(UUO)模型中验证了人类CKD转录组分析的可靠性和功能研究的可行性。
结合WGCNA和差异基因分析,鉴定出9个与CKD发生发展呈正相关的基因和20个呈负相关的基因。通过随机森林算法,构建了3个基因集:最小基因集(,,)、中等基因集(,,,,,,,,)和最大基因集(,,,,,,,,,)。在外部验证中,最大plage评分在GSE66494中对CKD具有最佳分类性能(AUC:0.767),在GSE180394中(AUC:0.760),中等plage评分在GSE45980中对CKD进展具有预测性能(AUC = 0.758)。在多变量模型中,Cox回归分析构建了仅包含最小z评分的风险模型;进一步的LASSO回归分析纳入了性别和最小z评分,但逻辑回归多变量分析未能用任何评分构建模型。在小鼠单侧输尿管梗阻模型中观察到小鼠CKD与人类CKD在KEGG富集方面具有高度相似性,且与人类CKD发生发展相关的核心基因在小鼠中仍具有诊断价值。
本研究基于机器学习为CKD的发生发展提供了一个基于转录组学的风险预测模型,为CKD的进一步实验研究提供了潜在的靶基因。