Srivastava Alok, Kumar Suraj, Ramaswamy Ramakrishna
C R RAO Advanced Institute of Mathematics, Statistics and Computer Science, University of Hyderabad Campus, Hyderabad 500046, India.
BMC Syst Biol. 2014 Jul 5;8:81. doi: 10.1186/1752-0509-8-81.
Genomic, proteomic and high-throughput gene expression data, when integrated, can be used to map the interaction networks between genes and proteins. Different approaches have been used to analyze these networks, especially in cancer, where mutations in biologically related genes that encode mutually interacting proteins are believed to be involved. This system of integrated networks as a whole exhibits emergent biological properties that are not obvious at the individual network level. We analyze the system in terms of modules, namely a set of densely interconnected nodes that can be further divided into submodules that are expected to participate in multiple biological activities in coordinated manner.
In the present work we construct two layers of the breast cancer network: the gene layer, where the correlation network of breast cancer genes is analyzed to identify gene modules, and the protein layer, where each gene module is extended to map out the network of expressed proteins and their interactions in order to identify submodules. Each module and its associated submodules are analyzed to test the robustness of their topological distribution. The constituent biological phenomena are explored through the use of the Gene Ontology. We thus construct a "network of networks", and demonstrate that both the gene and protein interaction networks are modular in nature. By focusing on the ontological classification, we are able to determine the entire GO profiles that are distributed at different levels of hierarchy. Within each submodule most of the proteins are biologically correlated, and participate in groups of distinct biological activities.
The present approach is an effective method for discovering coherent gene modules and protein submodules. We show that this also provides a means of determining biological pathways (both novel and as well those that have been reported previously) that are related, in the present instance, to breast cancer. Similar strategies are likely to be useful in the analysis of other diseases as well.
基因组、蛋白质组和高通量基因表达数据整合后,可用于绘制基因与蛋白质之间的相互作用网络。人们采用了不同方法来分析这些网络,尤其是在癌症领域,据信编码相互作用蛋白质的生物学相关基因中的突变参与其中。作为一个整体的这种整合网络系统呈现出在各个网络层面并不明显的新兴生物学特性。我们从模块的角度来分析该系统,即一组紧密相连的节点,这些节点可进一步细分为预计以协调方式参与多种生物学活动的子模块。
在本研究中,我们构建了乳腺癌网络的两个层次:基因层,分析乳腺癌基因的相关网络以识别基因模块;蛋白质层,扩展每个基因模块以绘制表达蛋白质及其相互作用的网络,从而识别子模块。对每个模块及其相关子模块进行分析,以测试其拓扑分布的稳健性。通过使用基因本体论来探索构成性生物学现象。我们由此构建了一个“网络的网络”,并证明基因和蛋白质相互作用网络本质上都是模块化的。通过关注本体分类,我们能够确定分布在不同层次水平的整个基因本体概况。在每个子模块内,大多数蛋白质在生物学上是相关的,并参与不同的生物学活动组。
本方法是发现连贯基因模块和蛋白质子模块的有效方法。我们表明,这也提供了一种确定与乳腺癌相关的生物学途径(包括新的以及先前已报道的途径)的手段。类似的策略可能在其他疾病的分析中也有用。