Department of Pathology, University of California San Diego, San Diego, California, United States of America.
PLoS One. 2010 Jul 21;5(7):e11587. doi: 10.1371/journal.pone.0011587.
The single rate codon model of non-synonymous substitution is ubiquitous in phylogenetic modeling. Indeed, the use of a non-synonymous to synonymous substitution rate ratio parameter has facilitated the interpretation of selection pressure on genomes. Although the single rate model has achieved wide acceptance, we argue that the assumption of a single rate of non-synonymous substitution is biologically unreasonable, given observed differences in substitution rates evident from empirical amino acid models. Some have attempted to incorporate amino acid substitution biases into models of codon evolution and have shown improved model performance versus the single rate model. Here, we show that the single rate model of non-synonymous substitution is easily outperformed by a model with multiple non-synonymous rate classes, yet in which amino acid substitution pairs are assigned randomly to these classes. We argue that, since the single rate model is so easy to improve upon, new codon models should not be validated entirely on the basis of improved model fit over this model. Rather, we should strive to both improve on the single rate model and to approximate the general time-reversible model of codon substitution, with as few parameters as possible, so as to reduce model over-fitting. We hint at how this can be achieved with a Genetic Algorithm approach in which rate classes are assigned on the basis of sequence information content.
单速率密码子模型在系统发育建模中无处不在。事实上,使用非同义替换到同义替换的速率比参数有助于解释对基因组的选择压力。尽管单速率模型已被广泛接受,但我们认为,鉴于从经验氨基酸模型中观察到的替换率差异,单个非同义替换率的假设在生物学上是不合理的。一些人试图将氨基酸替换偏向纳入密码子进化模型中,并显示出与单速率模型相比,模型性能有所提高。在这里,我们表明,具有多个非同义速率类的模型很容易超过单速率模型的非同义替换模型,而在该模型中,氨基酸替换对被随机分配到这些类中。我们认为,由于单速率模型很容易改进,因此新的密码子模型不应完全基于该模型对改进的模型拟合度进行验证。相反,我们应该努力改进单速率模型,并尽可能接近通用的时间可逆密码子替换模型,同时使用尽可能少的参数,以减少模型过度拟合。我们暗示了如何通过遗传算法方法实现这一点,其中根据序列信息量分配速率类。