Rzhetsky A, Gomez S M
Columbia Genome Center, Columbia University, New York, NY 10032, USA.
Bioinformatics. 2001 Oct;17(10):988-96. doi: 10.1093/bioinformatics/17.10.988.
Current growth in the field of genomics has provided a number of exciting approaches to the modeling of evolutionary mechanisms within the genome. Separately, dynamical and statistical analyses of networks such as the World Wide Web and the social interactions existing between humans have shown that these networks can exhibit common fractal properties-including the property of being scale-free. This work attempts to bridge these two fields and demonstrate that the fractal properties of molecular networks are linked to the fractal properties of their underlying genomes.
We suggest a stochastic model capable of describing the evolutionary growth of metabolic or signal-transduction networks. This model generates networks that share important statistical properties (so-called scale-free behavior) with real molecular networks. In particular, the frequency of vertices connected to exactly k other vertices follows a power-law distribution. The shape of this distribution remains invariant to changes in network scale: a small subgraph has the same distribution as the complete graph from which it is derived. Furthermore, the model correctly predicts that the frequencies of distinct DNA and protein domains also follow a power-law distribution. Finally, the model leads to a simple equation linking the total number of different DNA and protein domains in a genome with both the total number of genes and the overall network topology.
MatLab (MathWorks, Inc.) programs described in this manuscript are available on request from the authors.
基因组学领域当前的发展为基因组内进化机制建模提供了许多令人兴奋的方法。另外,对诸如万维网和人类之间存在的社会互动等网络的动力学和统计分析表明,这些网络可以表现出共同的分形特性,包括无标度特性。这项工作试图在这两个领域之间架起桥梁,并证明分子网络的分形特性与其基础基因组的分形特性相关联。
我们提出了一个能够描述代谢或信号转导网络进化生长的随机模型。该模型生成的网络与真实分子网络具有重要的统计特性(所谓的无标度行为)。特别是,与恰好k个其他顶点相连的顶点的频率遵循幂律分布。这种分布的形状在网络规模变化时保持不变:一个小子图与从中导出它的完整图具有相同的分布。此外,该模型正确地预测了不同DNA和蛋白质结构域的频率也遵循幂律分布。最后,该模型得出了一个简单的方程,将基因组中不同DNA和蛋白质结构域的总数与基因总数和整体网络拓扑结构联系起来。
本文描述的MatLab(MathWorks公司)程序可应作者要求提供。