Laboratoire de Physique Théorique du CNRS, IRSAMC, Université de Toulouse, UPS, Toulouse, France.
PLoS One. 2013 May 9;8(5):e61519. doi: 10.1371/journal.pone.0061519. Print 2013.
For DNA sequences of various species we construct the Google matrix [Formula: see text] of Markov transitions between nearby words composed of several letters. The statistical distribution of matrix elements of this matrix is shown to be described by a power law with the exponent being close to those of outgoing links in such scale-free networks as the World Wide Web (WWW). At the same time the sum of ingoing matrix elements is characterized by the exponent being significantly larger than those typical for WWW networks. This results in a slow algebraic decay of the PageRank probability determined by the distribution of ingoing elements. The spectrum of [Formula: see text] is characterized by a large gap leading to a rapid relaxation process on the DNA sequence networks. We introduce the PageRank proximity correlator between different species which determines their statistical similarity from the view point of Markov chains. The properties of other eigenstates of the Google matrix are also discussed. Our results establish scale-free features of DNA sequence networks showing their similarities and distinctions with the WWW and linguistic networks.
对于各种物种的 DNA 序列,我们构建了马尔可夫转移的 Google 矩阵 [公式:见正文],这些转移由几个字母组成的附近单词组成。该矩阵的矩阵元素的统计分布被表明由幂律描述,其指数接近于万维网 (WWW) 等无标度网络中的传出链接的指数。同时,输入矩阵元素的总和的特征在于指数明显大于 WWW 网络的典型指数。这导致由输入元素的分布确定的 PageRank 概率的缓慢代数衰减。[公式:见正文]的谱特征在于大的间隙,导致 DNA 序列网络上的快速松弛过程。我们引入了不同物种之间的 PageRank 接近相关器,它从马尔可夫链的角度确定它们的统计相似性。还讨论了 Google 矩阵的其他本征态的性质。我们的结果建立了 DNA 序列网络的无标度特征,展示了它们与 WWW 和语言网络的相似性和区别。