Blaisdell B E
J Mol Evol. 1984;21(3):278-88. doi: 10.1007/BF02102360.
Sixty-four eucaryotic nuclear DNA sequences, half of them coding and half noncoding, have been examined as expressions of first-, second-, or third-order Markov chains. Standard statistical tests found that most of the sequences required at least second-order Markov chains for their representation, and some required chains of third order. For all 64 sequences the observed one-step second-order transition count matrices were effective in predicting the two-step transition count matrices, and 56 of 64 were effective in predicting the three-step transition count matrices. The departure from random expectation of the observed first- and second-order transition count matrices meant that a considerable sample of eucaryotic nuclear DNA sequences, both protein coding and noncoding, have significant local structure over subsequences of three to five contiguous bases, and that this structure occurs throughout the total length of the sequence. These results suggested that present DNA sequences may have arisen from the duplication, concatenation, and gradual modification of very early short sequences.
64个真核细胞核DNA序列,其中一半是编码序列,一半是非编码序列,已被作为一阶、二阶或三阶马尔可夫链的表达形式进行了研究。标准统计测试发现,大多数序列至少需要二阶马尔可夫链来表示,有些则需要三阶链。对于所有64个序列,观察到的一步二阶转移计数矩阵在预测两步转移计数矩阵方面是有效的,64个中的56个在预测三步转移计数矩阵方面是有效的。观察到的一阶和二阶转移计数矩阵偏离随机预期,这意味着相当数量的真核细胞核DNA序列样本,包括蛋白质编码序列和非编码序列,在三到五个连续碱基的子序列上具有显著的局部结构,并且这种结构存在于序列的整个长度中。这些结果表明,目前的DNA序列可能是由非常早期的短序列的复制、串联和逐渐修饰产生的。