Lu Wei, Pan Xiang, Dai Siqi, Fu Dongliang, Hwang Maxwell, Zhu Yingshuang, Zhang Lina, Wei Jingsun, Kong Xiangxing, Li Jun, Xiao Qian, Ding Kefeng
Department of Colorectal Surgery and Oncology, Key Laboratory of Cancer Prevention and Intervention, Ministry of Education, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
Cancer Center, Zhejiang University, Hangzhou, China.
J Oncol. 2021 Feb 10;2021:6657397. doi: 10.1155/2021/6657397. eCollection 2021.
Stage II colorectal cancer patients had heterogeneous prognosis, and patients with recurrent events had poor survival. In this study, we aimed to identify stage II colorectal cancer recurrence associated genes by microarray meta-analysis and build predictive models to stratify patients' recurrence-free survival.
We searched the GEO database to retrieve eligible microarray datasets. The microarray meta-analysis was used to identify universal recurrence associated genes. Total samples were randomly divided into the training set and the test set. Two survival models (lasso Cox model and random survival forest model) were trained in the training set, and AUC values of the time-dependent receiver operating characteristic (ROC) curves were calculated. Survival analysis was performed to determine whether there was significant difference between the predicted high and low risk groups in the test set.
Six datasets containing 651 stage II colorectal cancer patients were included in this study. The meta-analysis identified 479 recurrence associated genes. KEGG and GO enrichment analysis showed that G protein-coupled glutamate receptor binding and Hedgehog signaling were significantly enriched. AUC values of the lasso Cox model and the random survival forest model were 0.815 and 0.993 at 60 months, respectively. In addition, the random survival forest model demonstrated that the effects of gene expression on the recurrence-free survival probability were nonlinear. According to the risk scores computed by the random survival forest model, the high risk group had significantly higher recurrence risk than the low risk group (HR = 1.824, 95% CI: 1.079-3.084, = 0.025).
We identified 479 stage II colorectal cancer recurrence associated genes by microarray meta-analysis. The random survival forest model which was based on the recurrence associated gene signature could strongly predict the recurrence risk of stage II colorectal cancer patients.
II期结直肠癌患者的预后存在异质性,复发患者的生存率较差。在本研究中,我们旨在通过微阵列荟萃分析鉴定II期结直肠癌复发相关基因,并建立预测模型以对患者的无复发生存进行分层。
我们检索了基因表达综合数据库(GEO)以获取符合条件的微阵列数据集。使用微阵列荟萃分析来鉴定普遍的复发相关基因。将总样本随机分为训练集和测试集。在训练集中训练两种生存模型(套索Cox模型和随机生存森林模型),并计算时间依赖性受试者工作特征(ROC)曲线的AUC值。进行生存分析以确定测试集中预测的高风险组和低风险组之间是否存在显著差异。
本研究纳入了6个数据集,共651例II期结直肠癌患者。荟萃分析鉴定出479个复发相关基因。KEGG和GO富集分析表明,G蛋白偶联谷氨酸受体结合和刺猬信号通路显著富集。套索Cox模型和随机生存森林模型在60个月时的AUC值分别为0.815和0.993。此外,随机生存森林模型表明基因表达对无复发生存概率的影响是非线性的。根据随机生存森林模型计算的风险评分,高风险组的复发风险显著高于低风险组(HR = 1.824,95% CI:1.079 - 3.084,P = 0.025)。
我们通过微阵列荟萃分析鉴定出479个II期结直肠癌复发相关基因。基于复发相关基因特征的随机生存森林模型能够有力地预测II期结直肠癌患者的复发风险。