Professional Programme for Agricultural Bioinformatics, University of Tokyo, 1-1-1 Yayoi Bunkyo-Ku, Tokyo, Japan.
Syst Biol. 2009 Apr;58(2):199-210. doi: 10.1093/sysbio/syp015. Epub 2009 Jun 29.
Statistical models for the evolution of molecular sequences play an important role in the study of evolutionary processes. For the evolutionary analysis of protein-coding sequences, 3 types of evolutionary models are available: 1) nucleotide, 2) amino acid, and 3) codon substitution models. Selecting appropriate models can greatly improve the estimation of phylogenies and divergence times and the detection of positive selection. Although much attention has been paid to the comparisons among the same types of models, relatively little attention has been paid to the comparisons among the different types of models. Additionally, because such models have different data structures, comparison of those models using conventional model selection criteria such as Akaike information criterion (AIC) or Bayesian information criterion (BIC) is not straightforward. Here, we suggest new procedures to convert models of the above-mentioned 3 types to 64-dimensional models with nucleotide triplet substitution. These conversion procedures render it possible to statistically compare the models of these 3 types by using AIC or BIC. By analyzing divergent and conserved interspecific mammalian sequences and intraspecific human population data, we show the superiority of the codon substitution models and discuss the advantages and disadvantages of the models of the 3 types.
统计模型在分子序列的进化研究中起着重要作用。对于蛋白质编码序列的进化分析,有 3 种可用的进化模型:1)核苷酸,2)氨基酸和 3)密码子替代模型。选择合适的模型可以极大地提高系统发育和分歧时间的估计以及正选择的检测。尽管已经对同类型模型之间的比较给予了很多关注,但相对较少关注不同类型模型之间的比较。此外,由于这些模型具有不同的数据结构,因此使用传统的模型选择标准(如 Akaike 信息准则(AIC)或贝叶斯信息准则(BIC))来比较这些模型并不简单。在这里,我们建议了一种新的程序,将上述 3 种类型的模型转换为具有三核苷酸替代的 64 维模型。这些转换程序使得通过使用 AIC 或 BIC 可以对这 3 种模型进行统计学比较。通过分析不同和保守的种间哺乳动物序列和种内人类群体数据,我们展示了密码子替代模型的优越性,并讨论了这 3 种模型的优缺点。