Guo Feng-Biao, Yu Xiu-Juan
School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
BMC Genomics. 2007 Oct 10;8:366. doi: 10.1186/1471-2164-8-366.
The nucleotide compositional asymmetry between the leading and lagging strands in bacterial genomes has been the subject of intensive study in the past few years. It is interesting to mention that almost all bacterial genomes exhibit the same kind of base asymmetry. This work aims to investigate the strand biases in Chlamydia muridarum genome and show the potential of the Z curve method for quantitatively differentiating genes on the leading and lagging strands.
The occurrence frequencies of bases of protein-coding genes in C. muridarum genome were analyzed by the Z curve method. It was found that genes located on the two strands of replication have distinct base usages in C. muridarum genome. According to their positions in the 9-D space spanned by the variables u1 - u9 of the Z curve method, K-means clustering algorithm can assign about 94% of genes to the correct strands, which is a few percent higher than those correctly classified by K-means based on the RSCU. The base usage and codon usage analyses show that genes on the leading strand have more G than C and more T than A, particularly at the third codon position. For genes on the lagging strand the biases is reverse. The y component of the Z curves for the complete chromosome sequences show that the excess of G over C and T over A are more remarkable in C. muridarum genome than in other bacterial genomes without separating base and/or codon usages. Furthermore, for the genomes of Borrelia burgdorferi, Treponema pallidum, Chlamydia muridarum and Chlamydia trachomatis, in which distinct base and/or codon usages have been observed, closer phylogenetic distance is found compared with other bacterial genomes.
The nature of the strand biases of base composition in C. muridarum is similar to that in most other bacterial genomes. However, the base composition asymmetry between the leading and lagging strands in C. muridarum is more significant than that in other bacteria. It's supposed that the remarkable strand biases of G/C and T/A are responsible for the appearance of separate base or codon usages in C. muridarum. On the other hand, the closer phylogenetic distance among the four bacterial genomes with separate base and/or codon usages is necessary rather than occasional. It's also shown that the Z curve method may be more sensitive than RSCU when being used to quantitatively analyze DNA sequences.
细菌基因组中前导链和后随链之间的核苷酸组成不对称性在过去几年中一直是深入研究的课题。值得一提的是,几乎所有细菌基因组都表现出相同类型的碱基不对称性。这项工作旨在研究鼠衣原体基因组中的链偏好,并展示Z曲线方法在定量区分前导链和后随链上基因的潜力。
用Z曲线方法分析了鼠衣原体基因组中蛋白质编码基因的碱基出现频率。发现位于复制两条链上的基因在鼠衣原体基因组中有不同的碱基使用情况。根据它们在Z曲线方法的变量u1 - u9所跨越的九维空间中的位置,K均值聚类算法可以将约94%的基因分配到正确的链上,这比基于相对同义密码子使用(RSCU)的K均值聚类正确分类的比例高几个百分点。碱基使用和密码子使用分析表明,前导链上的基因G比C多,T比A多,特别是在密码子的第三位。对于后随链上的基因,这种偏好则相反。完整染色体序列的Z曲线的y分量表明,与其他未区分碱基和/或密码子使用情况的细菌基因组相比,鼠衣原体基因组中G超过C以及T超过A的情况更为明显。此外,对于已观察到不同碱基和/或密码子使用情况的伯氏疏螺旋体、梅毒螺旋体、鼠衣原体和沙眼衣原体的基因组,与其他细菌基因组相比,发现它们的系统发育距离更近。
鼠衣原体碱基组成的链偏好性质与大多数其他细菌基因组相似。然而,鼠衣原体前导链和后随链之间的碱基组成不对称性比其他细菌更为显著。据推测,G/C和T/A明显的链偏好是鼠衣原体中出现单独碱基或密码子使用情况的原因。另一方面,具有单独碱基和/或密码子使用情况的四个细菌基因组之间更近的系统发育距离是必然的而非偶然的。还表明,在用于定量分析DNA序列时,Z曲线方法可能比RSCU更敏感。