Kubatko Laura, Shah Premal, Herbei Radu, Gilchrist Michael A
Department of Statistics, The Ohio State University, Columbus, OH 43210, United States; Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, United States.
Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, United States.
Mol Phylogenet Evol. 2016 Jan;94(Pt A):290-7. doi: 10.1016/j.ympev.2015.08.026. Epub 2015 Sep 8.
The quality of phylogenetic inference made from protein-coding genes depends, in part, on the realism with which the codon substitution process is modeled. Here we propose a new mechanistic model that combines the standard M0 substitution model of Yang (1997) with a simplified model from Gilchrist (2007) that includes selection on synonymous substitutions as a function of codon-specific nonsense error rates. We tested the newly proposed model by applying it to 104 protein-coding genes in brewer's yeast, and compared the fit of the new model to the standard M0 model and to the mutation-selection model of Yang and Nielsen (2008) using the AIC. Our new model provided significantly better fit in approximately 85% of the cases considered for the basic M0 model and in approximately 25% of the cases for the M0 model with estimated codon frequencies, but only in a few cases when the mutation-selection model was considered. However, our model includes a parameter that can be interpreted as a measure of the rate of protein production, and the estimates of this parameter were highly correlated with an independent measure of protein production for the yeast genes considered here. Finally, we found that in some cases the new model led to the preference of a different phylogeny for a subset of the genes considered, indicating that substitution model choice may have an impact on the estimated phylogeny.
从蛋白质编码基因进行系统发育推断的质量,部分取决于对密码子替换过程进行建模的真实性。在此,我们提出一种新的机制模型,该模型将Yang(1997)的标准M0替换模型与Gilchrist(2007)的简化模型相结合,后者将同义替换选择作为密码子特异性无义错误率的函数。我们通过将新提出的模型应用于酿酒酵母中的104个蛋白质编码基因来对其进行测试,并使用AIC比较了新模型与标准M0模型以及Yang和Nielsen(2008)的突变选择模型的拟合度。在考虑基本M0模型的约85%的案例以及考虑密码子频率估计的M0模型的约25%的案例中,我们的新模型提供了显著更好的拟合,但在考虑突变选择模型时只有少数案例如此。然而,我们的模型包含一个可解释为蛋白质产生速率度量的参数,并且该参数的估计值与这里所考虑的酵母基因的蛋白质产生的独立度量高度相关。最后,我们发现,在某些情况下,新模型导致对所考虑基因子集的不同系统发育树的偏好,这表明替换模型的选择可能会对估计的系统发育产生影响。