Depart of BioHealth Informatics, School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA.
Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA.
BMC Med Genomics. 2020 Sep 21;13(Suppl 9):135. doi: 10.1186/s12920-020-00775-0.
Colon cancer is one of the leading causes of cancer deaths in the USA and around the world. Molecular level characters, such as gene expression levels and mutations, may provide profound information for precision treatment apart from pathological indicators. Transcription factors function as critical regulators in all aspects of cell life, but transcription factors-based biomarkers for colon cancer prognosis were still rare and necessary.
We implemented an innovative process to select the transcription factors variables and evaluate the prognostic prediction power by combining the Cox PH model with the random forest algorithm. We picked five top-ranked transcription factors and built a prediction model by using Cox PH regression. Using Kaplan-Meier analysis, we validated our predictive model on four independent publicly available datasets (GSE39582, GSE17536, GSE37892, and GSE17537) from the GEO database, consisting of 925 colon cancer patients.
A five-transcription-factors based predictive model for colon cancer prognosis has been developed by using TCGA colon cancer patient data. Five transcription factors identified for the predictive model is HOXC9, ZNF556, HEYL, HOXC4 and HOXC6. The prediction power of the model is validated with four GEO datasets consisting of 1584 patient samples. Kaplan-Meier curve and log-rank tests were conducted on both training and validation datasets, the difference of overall survival time between predicted low and high-risk groups can be clearly observed. Gene set enrichment analysis was performed to further investigate the difference between low and high-risk groups in the gene pathway level. The biological meaning was interpreted. Overall, our results prove our prediction model has a strong prediction power on colon cancer prognosis.
Transcription factors can be used to construct colon cancer prognostic signatures with strong prediction power. The variable selection process used in this study has the potential to be implemented in the prognostic signature discovery of other cancer types. Our five TF-based predictive model would help with understanding the hidden relationship between colon cancer patient survival and transcription factor activities. It will also provide more insights into the precision treatment of colon cancer patients from a genomic information perspective.
结肠癌是美国乃至全球癌症死亡的主要原因之一。除了病理指标外,分子水平特征,如基因表达水平和突变,可为精准治疗提供深刻信息。转录因子作为细胞生命各个方面的关键调节因子,但其用于结肠癌预后的转录因子标志物仍然很少且有必要。
我们采用了一种创新的流程,通过将 Cox PH 模型与随机森林算法相结合,选择转录因子变量并评估预后预测能力。我们挑选了五个排名最高的转录因子,并使用 Cox PH 回归构建了预测模型。通过 Kaplan-Meier 分析,我们在 GEO 数据库中的四个独立的公开数据集(GSE39582、GSE17536、GSE37892 和 GSE17537)上验证了我们的预测模型,这些数据集共包含 925 名结肠癌患者。
使用 TCGA 结肠癌患者数据,我们开发了一种基于五个转录因子的结肠癌预后预测模型。为预测模型确定的五个转录因子是 HOXC9、ZNF556、HEYL、HOXC4 和 HOXC6。该模型的预测能力通过包含 1584 个患者样本的四个 GEO 数据集进行了验证。Kaplan-Meier 曲线和对数秩检验分别在训练集和验证集上进行,低风险组和高风险组之间的总生存时间差异可以清楚地观察到。还进行了基因集富集分析,以进一步研究低风险组和高风险组在基因通路水平上的差异。解释了生物学意义。总体而言,我们的结果证明了我们的预测模型对结肠癌预后具有很强的预测能力。
转录因子可用于构建具有强大预测能力的结肠癌预后特征。本研究中使用的变量选择过程有可能应用于其他癌症类型的预后特征发现。我们的基于五个 TF 的预测模型将有助于了解结肠癌患者生存与转录因子活性之间的隐藏关系。它还将从基因组信息的角度为结肠癌患者的精准治疗提供更多见解。