Knight R D, Freeland S J, Landweber L F
Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA.
Genome Biol. 2001;2(4):RESEARCH0010. doi: 10.1186/gb-2001-2-4-research0010. Epub 2001 Mar 22.
Correlations between genome composition (in terms of GC content) and usage of particular codons and amino acids have been widely reported, but poorly explained. We show here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species (311 bacteria, 28 archaea and 257 eukaryotes). The model quantitatively predicts responses (slope and intercept of the regression line on genome GC content) of individual codons and amino acids to genome composition.
Codons respond to genome composition on the basis of their GC content relative to their synonyms (explaining 71-87% of the variance in response among the different codons, depending on measure). Amino-acid responses are determined by the mean GC content of their codons (explaining 71-79% of the variance). Similar trends hold for genes within a genome. Position-dependent selection for error minimization explains why individual bases respond differently to directional mutation pressure.
Our model suggests that GC content drives codon usage (rather than the converse). It unifies a large body of empirical evidence concerning relationships between GC content and amino-acid or codon usage in disparate systems. The relationship between GC content and codon and amino-acid usage is ahistorical; it is replicated independently in the three domains of living organisms, reinforcing the idea that genes and genomes at mutation/selection equilibrium reproduce a unique relationship between nucleic acid and protein composition. Thus, the model may be useful in predicting amino-acid or nucleotide sequences in poorly characterized taxa.
基因组组成(以GC含量衡量)与特定密码子和氨基酸使用之间的相关性已被广泛报道,但解释不足。我们在此表明,一个作用于核苷酸水平的简单过程模型可以解释大量物种样本(311种细菌、28种古细菌和257种真核生物)中的密码子使用情况。该模型定量预测了各个密码子和氨基酸对基因组组成的响应(回归线在基因组GC含量上的斜率和截距)。
密码子根据其相对于同义密码子的GC含量对基因组组成做出响应(根据测量方法不同,解释了不同密码子间71%-87%的响应差异)。氨基酸的响应由其密码子的平均GC含量决定(解释了71%-79%的差异)。基因组内的基因也呈现类似趋势。为使错误最小化而进行的位置依赖性选择解释了为什么单个碱基对定向突变压力的响应不同。
我们的模型表明GC含量驱动密码子使用(而非相反)。它统一了大量关于不同系统中GC含量与氨基酸或密码子使用之间关系的实证证据。GC含量与密码子及氨基酸使用之间的关系不依赖于进化历史;它在生物的三个域中独立复制,强化了这样一种观点,即处于突变/选择平衡状态的基因和基因组再现了核酸与蛋白质组成之间的独特关系。因此,该模型可能有助于预测特征描述不足的分类群中的氨基酸或核苷酸序列。