Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA.
BioData Min. 2014 Sep 9;7:20. doi: 10.1186/1756-0381-7-20. eCollection 2014.
Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner.
Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project.
We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge.
The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.
为了了解各种类型癌症的机制,人们一直在使用基于分子的数据(如基因表达谱)来进行有效的癌症临床结果预测,这种方法有望提供更好的诊断并支持进一步的治疗。然而,基于基因表达谱的临床结果预测在独立数据集之间存在差异。此外,由于基因不是孤立作用,而是在复杂的信号或调节网络中与其他基因相互作用,因此单基因表达结果预测对于癌症评估是有限的。此外,由于通路更有可能协同作用,因此最好结合专家知识以有用且有信息的方式组合通路。
因此,我们提出了一种新的方法来识别知识驱动的基因组相互作用,并应用于使用语法进化神经网络 (GENN) 发现与癌症临床表型相关的模型。为了证明所提出方法的实用性,我们使用癌症基因组图谱 (TCGA) 中的卵巢癌数据作为试点项目来预测临床阶段。
我们从单个知识库(如通路-通路相互作用的来源)中识别出与癌症阶段相关的知识驱动的基因组相互作用,也从不同知识库(如通路-蛋白家族相互作用)中识别出知识驱动的基因组相互作用通过整合不同类型的信息。值得注意的是,来自不同生物知识源的集成模型实现了 78.82%的平衡准确率,并优于仅使用基因表达或单一知识基础数据类型的顶级模型。此外,由于模型的结果是在特定的生物途径或其他专家知识的背景下提出的,因此它们更具可解释性。
我们在此提出的试点研究的成功将使我们能够进一步确定预测癌症临床生存和复发的模型。通过不同生物知识源内部/之间相互作用的全局视图来理解卵巢癌的肿瘤发生和进展,有可能为许多类型的癌症提供更有效的筛查策略和治疗靶点。