Sitnikova T L, Zharkikh A A
Institute of Cytology and Genetics, Russian Academy of Sciences, Novosibirsk.
Biosystems. 1993;30(1-3):113-35. doi: 10.1016/0303-2647(93)90066-l.
This work is an attempt to study the structural features and evolutionary patterns of nucleotide sequences by analyzing their 1- through 4-plet frequencies and statistical relations between them. We present mathematical apparatus for this analysis. In particular, we introduce criteria to estimate the degree of homogeneity of L-plet composition in a given set of sequences and the dependence of the L-plet frequencies on the composition of lower orders. We apply these criteria to the study of eubacteria, mitochondria and chloroplasts. We demonstrate that L-plet frequencies are quite useful for revealing evolutionary relationship between DNA sequences and that the non-random distribution is more typical for doublets than to triplets. Non-randomness of triplet composition is more characteristic to coding than to non-coding regions, while no significant differences in dinucleotide composition can be observed. The obtained results can be used for revealing possible mechanisms of the codon usage phenomena.
这项工作旨在通过分析核苷酸序列的单联体、二联体、三联体和四联体频率及其之间的统计关系,来研究核苷酸序列的结构特征和进化模式。我们为此分析提供了数学工具。特别地,我们引入了一些标准,用于估计给定序列集中L联体组成的均匀程度,以及L联体频率对低阶组成的依赖性。我们将这些标准应用于真细菌、线粒体和叶绿体的研究。我们证明,L联体频率对于揭示DNA序列之间的进化关系非常有用,并且非随机分布对于二联体比三联体更为典型。三联体组成的非随机性在编码区比非编码区更具特征,而二核苷酸组成则没有显著差异。所得结果可用于揭示密码子使用现象的可能机制。