Identification of progression-related genes and construction of prognostic model for chronic kidney disease by machine learning.

作者信息

Zhou Bingkun, Zhou Hu, Huang Xiaodong, Liu Shijie

机构信息

Department of Kidney Transplantation, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, China.

Department of Medicine, Nephrology Division, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.

出版信息

Front Cell Dev Biol. 2025 Aug 15;13:1627355. doi: 10.3389/fcell.2025.1627355. eCollection 2025.

Abstract

BACKGROUND

Early diagnosis and intervention for chronic kidney disease (CKD) can significantly improve patient's quality of life and prognosis. Besides routine laboratory indicators and medical history, risk prediction models can predict CKD outcome. However, there is currently a lack of CKD prognostic prediction models based on transcriptomics and machine learning.

METHODS

Utilizing weighted correlation network analysis (WGCNA) and random forest algorithms in GSE137570, three core gene sets of different sizes were constructed, which were externally validated in GSE66494 and GSE180394, and evaluated for their predictive performance in GSE45980 by receiver operating characteristic (ROC) curves. Predictive models were built using Cox regression, LASSO regression, and logistic regression in GSE60861. And the reliability of human CKD transcriptomic analysis and the feasibility of functional studies were validated in a mouse UUO model.

RESULTS

Combining WGCNA and differential gene analysis, 9 genes positively associated with CKD occurrence and development and 20 genes negatively associated with that were identified. By random forest algorithm, three gene sets were constructed: minimal gene set (, , ), medium gene set (, , , , , , , , ), and maximal gene set (, , , , , , , , , ). In external validation, the maximal plage score had best classification performance for CKD (AUC:0.767) in GSE66494 and in GSE180394 (AUC:0.760), the medium plage score achieved a predictive performance for CKD progression (AUC = 0.758) in GSE45980. In the multivariate model, Cox regression analysis constructed a risk model with only minimal z-score, further LASSO regression analysis included gender and minimal z-score, but logistic regression multivariate analysis failed to be constructed with any score. A high degree of similarity between mouse CKD and human CKD in KEGG enrichment was observed in the mouse unilateral ureteral obstruction model, and the core genes related to the occurrence and progression of human CKD remained diagnostically valuable in mice.

CONCLUSION

This study provides a transcriptomics-based risk prediction model for the occurrence and development of CKD based on machine learning, offering potential target genes for the further experimental research of CKD.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e2c/12395503/0000c94b9a51/fcell-13-1627355-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索