Wang Jian-Yong, Chen Ling-Ling, Zhou Xiong-Hui
College of Informatics, Huazhong Agricultural University, Wuhan 430070, P.R. China.
Oncotarget. 2017 Jul 11;8(28):46398-46413. doi: 10.18632/oncotarget.18189.
Identifying the prognostic genes in cancer is essential not only for the treatment of cancer patients, but also for drug discovery. However, it's still a big challenge to select the prognostic genes that can distinguish the risk of cancer patients across various data sets because of tumor heterogeneity. In this situation, the selected genes whose expression levels are statistically related to prognostic risks may be passengers. In this paper, based on gene expression data and prognostic data of ovarian cancer patients, we used conditional mutual information to construct gene dependency network in which the nodes (genes) with more out-degrees have more chances to be the modulators of cancer prognosis. After that, we proposed DirGenerank (Generank in direct netowrk) algorithm, which concerns both the gene dependency network and genes' correlations to prognostic risks, to identify the gene signature that can predict the prognostic risks of ovarian cancer patients. Using ovarian cancer data set from TCGA (The Cancer Genome Atlas) as training data set, 40 genes with the highest importance were selected as prognostic signature. Survival analysis of these patients divided by the prognostic signature in testing data set and four independent data sets showed the signature can distinguish the prognostic risks of cancer patients significantly. Enrichment analysis of the signature with curated cancer genes and the drugs selected by CMAP showed the genes in the signature may be drug targets for therapy. In summary, we have proposed a useful pipeline to identify prognostic genes of cancer patients.
识别癌症中的预后基因不仅对癌症患者的治疗至关重要,而且对药物研发也很关键。然而,由于肿瘤异质性,在各种数据集中选择能够区分癌症患者风险的预后基因仍然是一个巨大的挑战。在这种情况下,那些表达水平与预后风险存在统计学关联的所选基因可能只是“过客”。在本文中,基于卵巢癌患者的基因表达数据和预后数据,我们使用条件互信息构建基因依赖网络,其中出度较高的节点(基因)更有可能成为癌症预后的调节因子。之后,我们提出了DirGenerank(直接网络中的Generank)算法,该算法兼顾基因依赖网络和基因与预后风险的相关性,以识别能够预测卵巢癌患者预后风险的基因特征。使用来自TCGA(癌症基因组图谱)的卵巢癌数据集作为训练数据集,选择了40个重要性最高的基因作为预后特征。在测试数据集和四个独立数据集中,根据该预后特征对这些患者进行生存分析,结果表明该特征能够显著区分癌症患者的预后风险。用整理后的癌症基因和CMAP选择的药物对该特征进行富集分析,结果表明该特征中的基因可能是治疗的药物靶点。总之,我们提出了一种有用的流程来识别癌症患者的预后基因。