Department of Biosciences, Brookhaven National Laboratory, Upton, NY 11973, USA.
Proc Natl Acad Sci U S A. 2013 Apr 9;110(15):6235-9. doi: 10.1073/pnas.1217795110. Epub 2013 Mar 25.
Bacterial genomes and large-scale computer software projects both consist of a large number of components (genes or software packages) connected via a network of mutual dependencies. Components can be easily added or removed from individual systems, and their use frequencies vary over many orders of magnitude. We study this frequency distribution in genomes of ∼500 bacterial species and in over 2 million Linux computers and find that in both cases it is described by the same scale-free power-law distribution with an additional peak near the tail of the distribution corresponding to nearly universal components. We argue that the existence of a power law distribution of frequencies of components is a general property of any modular system with a multilayered dependency network. We demonstrate that the frequency of a component is positively correlated with its dependency degree given by the total number of upstream components whose operation directly or indirectly depends on the selected component. The observed frequency/dependency degree distributions are reproduced in a simple mathematically tractable model introduced and analyzed in this study.
细菌基因组和大型计算机软件项目都由大量通过相互依赖网络连接的组件(基因或软件包)组成。组件可以轻松地从单个系统中添加或删除,并且它们的使用频率在许多数量级上变化。我们研究了约 500 种细菌物种的基因组和超过 200 万台 Linux 计算机中的这种频率分布,发现这两种情况都由相同的无标度幂律分布描述,并且在分布的尾部附近还有一个额外的峰值,对应于几乎普遍存在的组件。我们认为,组件频率的幂律分布是任何具有多层次依赖网络的模块化系统的一般属性。我们证明,给定组件的依赖度(即其上游组件的总数,这些组件的操作直接或间接依赖于所选组件)与组件的频率呈正相关。在本研究中引入和分析的一个简单的数学上可处理的模型中再现了观察到的频率/依赖度分布。