Xia Xuhua, Wang Huaichun, Xie Zheng, Carullo Malisa, Huang Huang, Hickey Donal
Department of Biology, University of Ottawa, Ottawa, Ontario, Canada.
Mol Biol Evol. 2006 Jul;23(7):1450-4. doi: 10.1093/molbev/msl012. Epub 2006 May 10.
Previous studies have argued that, given the AT-rich nature of stop codons, the length and CG% of coding sequences (CDSs) should be positively correlated. This prediction is generally supported empirically by prokaryotic genomes. However, the correlation is weak for a number of species, with 4 species showing a negative correlation. Here we formulate a more general hypothesis incorporating selection against cytosine (C) usage to explain the lack of strong positive correlation between the length and GC% of CDSs. Two factors contribute to the selection against C usage in long CDSs. First, C is the least abundant nucleotide in the cell, and a long CDS should have fewer Cs to increase transcription efficiency. Second, C is prone to mutation to U/T and selection for increased reliability should reduce C usage in long CDSs. Empirical data from prokaryotic genomes lend strong support for this new hypothesis.
先前的研究认为,鉴于终止密码子富含AT的特性,编码序列(CDS)的长度和CG%应该呈正相关。这一预测在原核生物基因组中通常得到经验支持。然而,对于许多物种来说,这种相关性较弱,有4个物种呈现负相关。在这里,我们提出一个更普遍的假说,纳入对胞嘧啶(C)使用的选择,以解释CDS长度与GC%之间缺乏强正相关的现象。有两个因素导致了对长CDS中C使用的选择。首先,C是细胞中含量最少的核苷酸,长的CDS应该有更少的C以提高转录效率。其次,C容易突变为U/T,为提高可靠性的选择应该减少长CDS中C的使用。来自原核生物基因组的经验数据为这一新假说提供了有力支持。