Ottawa Hospital Research Institute, Ottawa, Canada.
PLoS One. 2010 Oct 27;5(10):e13431. doi: 10.1371/journal.pone.0013431.
In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns.
In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively.
The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured.
尽管对突变和选择对密码子使用的影响进行了广泛的研究,但由于突变偏向导致的密码子使用偏向的一般模型仍然缺乏。由于大多数氨基酸允许在第三个密码子位置发生同义 GC 含量变化替换,因此基因组或基因组区域的整体 GC 偏向与 GC3 高度相关,GC3 是衡量第三个位置 GC 含量的指标。对于单个氨基酸也是如此,G/C 结尾的密码子的使用通常随着 GC 偏向的增加而增加,随着 AT 偏向的增加而减少。精氨酸和亮氨酸这两种氨基酸允许在第一和第三个密码子位置发生 GC 变化的同义替换,它们的密码子可能表现出不同的使用模式。
在分析数百个原核生物和植物基因组以及人类基因的密码子使用偏向时,我们发现两个 G 结尾的密码子,AGG(精氨酸)和 TTG(亮氨酸),与所有其他 G/C 结尾的密码子不同,其使用随着 GC 偏向的增加而减少,这与通常的预期相反,即 G/C 结尾的密码子使用应该随着基因组 GC 偏向的增加而增加。此外,一些密码子的使用似乎是非线性的,甚至是非单调的,作为 GC 偏向的函数。为了解释这些观察结果,我们提出了一个 GC 偏向同义替换的连续时间马尔可夫链模型。该模型正确预测了所有密码子的定性使用模式,包括异亮氨酸、精氨酸和亮氨酸的非线性密码子使用。该模型分别解释了原核生物、植物和人类中密码子使用变异性的 72%、64%和 52%。当根据常见的 GC 含量对密码子进行分组时,该模型分别解释了原核生物、植物和人类中 87%、80%和 68%的使用变化。
该模型阐明了 GC 突变偏向对密码子使用可能产生的有时违反直觉的影响,量化了 GC 突变偏向的影响,并提供了一个相对于其他影响密码子偏向的自然零模型。