Anisimova Maria, Kosiol Carolin
Institute of Computational Science, Swiss Federal Institute of Technology, Zurich, Switzerland.
Mol Biol Evol. 2009 Feb;26(2):255-71. doi: 10.1093/molbev/msn232. Epub 2008 Oct 14.
This review is motivated by the true explosion in the number of recent studies both developing and ameliorating probabilistic models of codon evolution. Traditionally parametric, the first codon models focused on estimating the effects of selective pressure on the protein via an explicit parameter in the maximum likelihood framework. Likelihood ratio tests of nested codon models armed the biologists with powerful tools, which provided unambiguous evidence for positive selection in real data. This, in turn, triggered a new wave of methodological developments. The new generation of models views the codon evolution process in a more sophisticated way, relaxing several mathematical assumptions. These models make a greater use of physicochemical amino acid properties, genetic code machinery, and the large amounts of data from the public domain. The overview of the most recent advances on modeling codon evolution is presented here, and a wide range of their applications to real data is discussed. On the downside, availability of a large variety of models, each accounting for various biological factors, increases the margin for misinterpretation; the biological meaning of certain parameters may vary among models, and model selection procedures also deserve greater attention. Solid understanding of the modeling assumptions and their applicability is essential for successful statistical data analysis.
近期,无论是开发还是改进密码子进化概率模型的研究数量都出现了真正的爆发式增长,本综述正是受此推动而撰写。传统的密码子模型是参数化的,最初聚焦于通过最大似然框架中的一个显式参数来估计选择压力对蛋白质的影响。嵌套密码子模型的似然比检验为生物学家提供了强大的工具,这些工具为真实数据中的正选择提供了明确的证据。这反过来又引发了一波新的方法学发展浪潮。新一代模型以更复杂的方式看待密码子进化过程,放宽了若干数学假设。这些模型更多地利用了物理化学氨基酸特性、遗传密码机制以及来自公共领域的大量数据。本文介绍了密码子进化建模的最新进展概况,并讨论了它们在实际数据中的广泛应用。不利的一面是,大量不同的模型(每个模型都考虑了各种生物学因素)增加了误解的可能性;某些参数的生物学意义在不同模型之间可能会有所不同,而且模型选择程序也值得更多关注。扎实理解建模假设及其适用性对于成功进行统计数据分析至关重要。