Suppr超能文献

基于随机森林的慢性肾脏病风险预测模型的开发与评估

Development and evaluation of a chronic kidney disease risk prediction model using random forest.

作者信息

Mendapara Krish

机构信息

Faculty of Health Sciences, Queen's University, Kingston, ON, Canada.

出版信息

Front Genet. 2024 Jun 27;15:1409755. doi: 10.3389/fgene.2024.1409755. eCollection 2024.

Abstract

This research aims to advance the detection of Chronic Kidney Disease (CKD) through a novel gene-based predictive model, leveraging recent breakthroughs in gene sequencing. We sourced and merged gene expression profiles of CKD-affected renal tissues from the Gene Expression Omnibus (GEO) database, classifying them into two sets for training and validation in a 7:3 ratio. The training set included 141 CKD and 33 non-CKD specimens, while the validation set had 60 and 14, respectively. The disease risk prediction model was constructed using the training dataset, while the validation dataset confirmed the model's identification capabilities. The development of our predictive model began with evaluating differentially expressed genes (DEGs) between the two groups. We isolated six genes using Lasso and random forest (RF) methods-DUSP1, GADD45B, IFI44L, IFI30, ATF3, and LYZ-which are critical in differentiating CKD from non-CKD tissues. We refined our random forest (RF) model through 10-fold cross-validation, repeated five times, to optimize the mtry parameter. The performance of our model was robust, with an average AUC of 0.979 across the folds, translating to a 91.18% accuracy. Validation tests further confirmed its efficacy, with a 94.59% accuracy and an AUC of 0.990. External validation using dataset GSE180394 yielded an AUC of 0.913, 89.83% accuracy, and a sensitivity rate of 0.889, underscoring the model's reliability. In summary, the study identified critical genetic biomarkers and successfully developed a novel disease risk prediction model for CKD. This model can serve as a valuable tool for CKD disease risk assessment and contribute significantly to CKD identification.

摘要

本研究旨在通过一种基于基因的新型预测模型,利用基因测序的最新突破,推进慢性肾脏病(CKD)的检测。我们从基因表达综合数据库(GEO)中获取并合并了受CKD影响的肾组织的基因表达谱,将其按7:3的比例分为两组用于训练和验证。训练集包括141个CKD样本和33个非CKD样本,而验证集分别有60个和14个。使用训练数据集构建疾病风险预测模型,而验证数据集则证实了该模型的识别能力。我们的预测模型的开发始于评估两组之间的差异表达基因(DEG)。我们使用套索和随机森林(RF)方法分离出六个基因——双特异性磷酸酶1(DUSP1)、生长停滞和DNA损伤诱导蛋白45β(GADD45B)、干扰素诱导蛋白44样蛋白(IFI44L)、干扰素诱导蛋白30(IFI30)、活化转录因子3(ATF3)和溶菌酶(LYZ)——这些基因对于区分CKD和非CKD组织至关重要。我们通过10折交叉验证对随机森林(RF)模型进行了优化,重复五次以优化mtry参数。我们模型的性能稳健,各折的平均曲线下面积(AUC)为0.979,准确率达91.18%。验证测试进一步证实了其有效性,准确率为94.59%,AUC为0.990。使用数据集GSE180394进行的外部验证得出AUC为0.913,准确率为89.83%,灵敏度为0.889,突出了该模型的可靠性。总之,该研究确定了关键的遗传生物标志物,并成功开发了一种用于CKD的新型疾病风险预测模型。该模型可作为CKD疾病风险评估的宝贵工具,并为CKD的识别做出重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d60/11236722/bdb3808af0f1/fgene-15-1409755-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验