Ideker Trey, Ozier Owen, Schwikowski Benno, Siegel Andrew F
Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA Institute for Systems Biology, Seattle, WA 98103, USA.
Bioinformatics. 2002;18 Suppl 1:S233-40. doi: 10.1093/bioinformatics/18.suppl_1.s233.
In model organisms such as yeast, large databases of protein-protein and protein-DNA interactions have become an extremely important resource for the study of protein function, evolution, and gene regulatory dynamics. In this paper we demonstrate that by integrating these interactions with widely-available mRNA expression data, it is possible to generate concrete hypotheses for the underlying mechanisms governing the observed changes in gene expression. To perform this integration systematically and at large scale, we introduce an approach for screening a molecular interaction network to identify active subnetworks, i.e., connected regions of the network that show significant changes in expression over particular subsets of conditions. The method we present here combines a rigorous statistical measure for scoring subnetworks with a search algorithm for identifying subnetworks with high score.
We evaluated our procedure on a small network of 332 genes and 362 interactions and a large network of 4160 genes containing all 7462 protein-protein and protein-DNA interactions in the yeast public databases. In the case of the small network, we identified five significant subnetworks that covered 41 out of 77 (53%) of all significant changes in expression. Both network analyses returned several top-scoring subnetworks with good correspondence to known regulatory mechanisms in the literature. These results demonstrate how large-scale genomic approaches may be used to uncover signalling and regulatory pathways in a systematic, integrative fashion.
在诸如酵母等模式生物中,蛋白质 - 蛋白质和蛋白质 - DNA相互作用的大型数据库已成为研究蛋白质功能、进化和基因调控动力学的极其重要的资源。在本文中,我们证明,通过将这些相互作用与广泛可用的mRNA表达数据相结合,有可能为控制观察到的基因表达变化的潜在机制生成具体假设。为了系统地大规模进行这种整合,我们引入了一种方法来筛选分子相互作用网络以识别活性子网,即网络中在特定条件子集上显示出显著表达变化的连通区域。我们在此提出的方法将用于给子网评分的严格统计量度与用于识别高分子网的搜索算法相结合。
我们在一个由332个基因和362个相互作用组成的小网络以及一个包含酵母公共数据库中所有7462个蛋白质 - 蛋白质和蛋白质 - DNA相互作用的由4160个基因组成的大网络上评估了我们的程序。在小网络的情况下,我们识别出五个显著子网,它们涵盖了所有77个显著表达变化中的41个(53%)。两个网络分析都返回了几个高分子网,与文献中已知的调控机制有很好的对应关系。这些结果证明了大规模基因组方法可如何以系统、综合的方式用于揭示信号传导和调控途径。