Chen Hong-Da, Chang Chang-Heng, Hsieh Li-Ching, Lee Hoong-Chien
Department of Physics, National Central University, Chungli, Taiwan 320, Republic of China.
Phys Rev Lett. 2005 May 6;94(17):178103. doi: 10.1103/PhysRevLett.94.178103. Epub 2005 May 5.
Shannon information (SI) and its special case, divergence, are defined for a DNA sequence in terms of probabilities of chemical words in the sequence and are computed for a set of complete genomes highly diverse in length and composition. We find the following: SI (but not divergence) is inversely proportional to sequence length for a random sequence but is length independent for genomes; the genomic SI is always greater and, for shorter words and longer sequences, hundreds to thousands times greater than the SI in a random sequence whose length and composition match those of the genome; genomic SIs appear to have word-length dependent universal values. The universality is inferred to be an evolution footprint of a universal mode for genome growth.
香农信息(SI)及其特殊情况——散度,是根据DNA序列中化学词的概率来定义的,并针对一组长度和组成差异很大的完整基因组进行计算。我们发现以下几点:对于随机序列,SI(而非散度)与序列长度成反比,但对于基因组则与长度无关;基因组的SI总是更大,并且对于较短的词和较长的序列,比长度和组成与基因组匹配的随机序列中的SI大数百到数千倍;基因组SI似乎具有与词长相关的通用值。这种通用性被推断为基因组生长通用模式的进化印记。