Chen Yaw-Hwang, Nyeo Su-Long, Yeh Chiung-Yuh
Department of Electronics Engineering, Kun Shan University of Technology, Yung-Kang, Tainan Hsien, Taiwan, Republic Of China.
Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Jul;72(1 Pt 1):011908. doi: 10.1103/PhysRevE.72.011908. Epub 2005 Jul 18.
The evolutionary features based on the distributions of k-mers in the DNA sequences of various organisms are studied. The organisms are classified into three groups based on their evolutionary periods: (a) E. coli and T. pallidum (b) yeast, zebrafish, A. thaliana, and fruit fly, (c) mouse, chicken, and human. The distributions of 6-mers of these three groups are shown to be, respectively, (a) unimodal, (b) unimodal with peaks generally shifted to smaller frequencies of occurrence, (c) bimodal. To describe the bimodal feature of the k-mer distributions of group (c), a model based on the cytosine-guanine " CG" content of the DNA sequences is introduced and shown to provide reasonably good agreements.
研究了基于各种生物体DNA序列中k-mer分布的进化特征。根据进化时期将生物体分为三组:(a)大肠杆菌和梅毒螺旋体;(b)酵母、斑马鱼、拟南芥和果蝇;(c)小鼠、鸡和人类。结果表明,这三组的六聚体分布分别为:(a)单峰;(b)单峰,且峰值通常向较低的出现频率偏移;(c)双峰。为了描述(c)组k-mer分布的双峰特征,引入了一个基于DNA序列中胞嘧啶-鸟嘌呤“CG”含量的模型,并证明该模型能提供合理良好的拟合。