Gan Tao, Wei Xiaomeng, Xing Yuanhao, Hu Zhili
Department of Gastrointestinal Surgery, Liuzhou People's Hospital affiliated to Guangxi Medical University, Liuzhou, Guangxi Province, China.
Biomed Eng Comput Biol. 2024 Nov 2;15:11795972241293516. doi: 10.1177/11795972241293516. eCollection 2024.
Colorectal cancer (CRC) remains a significant health burden globally, necessitating a deeper understanding of its molecular landscape and prognostic markers. This study characterized ferroptosis-related genes (FRGs) to construct models for predicting overall survival (OS) across various CRC datasets.
In TCGA-COAD dataset, differentially expressed genes (DEGs) were identified between tumor and normal tissues using DESeq2 package. Prognostic genes were identified associated with OS, disease-specific survival, and progression-free interval using survival package. Additionally, FRGs were downloaded from FerrDb website, categorized into unclassified, marker, and driver genes. Finally, multiple models (Coxboost, Elastic Net, Gradient Boosting Machine, LASSO Regression, Partial Least Squares Regression for Cox Regression, Ridge Regression, Random Survival Forest [RSF], stepwise Cox Regression, Supervised Principal Components analysis, and Support Vector Machines) were employed to predict OS across multiple datasets (TCGA-COAD, GSE103479, GSE106584, GSE17536, GSE17537, GSE29621, GSE39084, GSE39582, and GSE72970) using intersection genes across DEGs, OS, disease-specific survival, and progression-free interval, and FRG categories.
Six intersection genes (ASNS, TIMP1, H19, CDKN2A, HOTAIR, and ASMTL-AS1) were identified, upregulated in tumor tissues, and associated with poor survival outcomes. In the TCGA-COAD dataset, the RSF model demonstrated the highest concordance index. Kaplan-Meier analysis revealed significantly lower OS probabilities in high-risk groups identified by the RSF model. The RSF model exhibited high accuracy with AUC values of 0.978, 0.985, and 0.965 for 1-, 3-, and 5-year survival predictions, respectively. Calibration curves demonstrated excellent agreement between predicted and observed survival probabilities. Decision curve analysis confirmed the clinical utility of the RSF model. Additionally, the model's performances were validated in GSE29621 dataset.
The study underscores the prognostic relevance of 6 intersection genes in CRC, providing insights into potential therapeutic targets and biomarkers for patient stratification. The RSF model demonstrates robust predictive performance, suggesting its utility in clinical risk assessment and personalized treatment strategies.
结直肠癌(CRC)在全球范围内仍是一项重大的健康负担,因此有必要更深入地了解其分子格局和预后标志物。本研究对铁死亡相关基因(FRGs)进行了特征分析,以构建预测多个CRC数据集总生存期(OS)的模型。
在TCGA-COAD数据集中,使用DESeq2软件包鉴定肿瘤组织和正常组织之间的差异表达基因(DEGs)。使用生存软件包鉴定与总生存期、疾病特异性生存期和无进展生存期相关的预后基因。此外,从FerrDb网站下载FRGs,并将其分为未分类基因、标志物基因和驱动基因。最后,采用多种模型(Coxboost、弹性网络、梯度提升机、套索回归、用于Cox回归的偏最小二乘回归、岭回归、随机生存森林[RSF]、逐步Cox回归、监督主成分分析和支持向量机),利用DEGs、总生存期、疾病特异性生存期、无进展生存期和FRG类别中的交集基因,预测多个数据集(TCGA-COAD、GSE103479、GSE106584、GSE17536、GSE17537、GSE29621、GSE39084、GSE39582和GSE72970)的总生存期。
鉴定出6个交集基因(ASNS、TIMP1、H19、CDKN2A、HOTAIR和ASMTL-AS1),这些基因在肿瘤组织中上调,且与不良生存结果相关。在TCGA-COAD数据集中,RSF模型显示出最高的一致性指数。Kaplan-Meier分析显示,RSF模型确定的高危组的总生存期概率显著更低。RSF模型表现出较高的准确性,1年、3年和5年生存预测的AUC值分别为0.978、0.985和0.965。校准曲线显示预测的和观察到的生存概率之间具有良好的一致性。决策曲线分析证实了RSF模型的临床实用性。此外,该模型的性能在GSE29621数据集中得到了验证。
该研究强调了6个交集基因在结直肠癌中的预后相关性,为潜在的治疗靶点和用于患者分层的生物标志物提供了见解。RSF模型显示出强大的预测性能,表明其在临床风险评估和个性化治疗策略中的实用性。