School of Computer Science and Engineering, Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences and The Sudarsky Center for Computational Biology, The Hebrew University, Jerusalem, 91904 Israel.
Bioinformatics. 2011 Jul 1;27(13):i142-8. doi: 10.1093/bioinformatics/btr201.
Much of the large-scale molecular data from living cells can be represented in terms of networks. Such networks occupy a central position in cellular systems biology. In the protein-protein interaction (PPI) network, nodes represent proteins and edges represent connections between them, based on experimental evidence. As PPI networks are rich and complex, a mathematical model is sought to capture their properties and shed light on PPI evolution. The mathematical literature contains various generative models of random graphs. It is a major, still largely open question, which of these models (if any) can properly reproduce various biologically interesting networks. Here, we consider this problem where the graph at hand is the PPI network of Saccharomyces cerevisiae. We are trying to distinguishing between a model family which performs a process of copying neighbors, represented by the duplication-divergence (DD) model, and models which do not copy neighbors, with the Barabási-Albert (BA) preferential attachment model as a leading example.
The observed property of the network is the distribution of maximal bicliques in the graph. This is a novel criterion to distinguish between models in this area. It is particularly appropriate for this purpose, since it reflects the graph's growth pattern under either model. This test clearly favors the DD model. In particular, for the BA model, the vast majority (92.9%) of the bicliques with both sides ≥4 must be already embedded in the model's seed graph, whereas the corresponding figure for the DD model is only 5.1%. Our results, based on the biclique perspective, conclusively show that a naïve unmodified DD model can capture a key aspect of PPI networks.
大量来自活细胞的大规模分子数据可以用网络来表示。在细胞系统生物学中,这种网络处于核心地位。在蛋白质-蛋白质相互作用(PPI)网络中,节点代表蛋白质,边代表它们之间的连接,这是基于实验证据的。由于 PPI 网络丰富而复杂,因此需要寻找一种数学模型来捕捉其特性并阐明 PPI 的进化。数学文献中包含各种随机图的生成模型。一个主要的、但在很大程度上尚未解决的问题是,这些模型(如果有的话)中,哪一个可以正确地再现各种具有生物学意义的网络。在这里,我们考虑了手头的图是酿酒酵母的 PPI 网络的情况。我们试图区分一种复制邻居的过程的模型家族,由复制-分歧(DD)模型表示,和不复制邻居的模型,以 Barabási-Albert(BA)优先连接模型为例。
观察到的网络特性是图中最大二部图的分布。这是区分该领域模型的一种新准则。由于它反映了在这两种模型下的网络增长模式,因此特别适合这个目的。这个测试明显倾向于 DD 模型。特别是对于 BA 模型,具有两侧≥4 的二部图中的绝大多数(92.9%)必须已经嵌入到模型的种子图中,而对于 DD 模型,这个数字只有 5.1%。基于二部图的观点,我们的结果明确表明,一个天真的未修改的 DD 模型可以捕获 PPI 网络的一个关键方面。