Woodhams Michael D, Fernández-Sánchez Jesús, Sumner Jeremy G
School of Physical Sciences, University of Tasmania, Hobart, TAS 7005, Australia and Departament de Matemàtica Aplicada I, Universitat Politècnica de Catalunya, Barcelona, Spain
School of Physical Sciences, University of Tasmania, Hobart, TAS 7005, Australia and Departament de Matemàtica Aplicada I, Universitat Politècnica de Catalunya, Barcelona, Spain.
Syst Biol. 2015 Jul;64(4):638-50. doi: 10.1093/sysbio/syv021. Epub 2015 Apr 8.
When the process underlying DNA substitutions varies across evolutionary history, some standard Markov models underlying phylogenetic methods are mathematically inconsistent. The most prominent example is the general time-reversible model (GTR) together with some, but not all, of its submodels. To rectify this deficiency, nonhomogeneous Lie Markov models have been identified as the class of models that are consistent in the face of a changing process of DNA substitutions regardless of taxon sampling. Some well-known models in popular use are within this class, but are either overly simplistic (e.g., the Kimura two-parameter model) or overly complex (the general Markov model). On a diverse set of biological data sets, we test a hierarchy of Lie Markov models spanning the full range of parameter richness. Compared against the benchmark of the ever-popular GTR model, we find that as a whole the Lie Markov models perform well, with the best performing models having 8-10 parameters and the ability to recognize the distinction between purines and pyrimidines.
当DNA替换背后的过程在进化历史中发生变化时,一些系统发育方法所基于的标准马尔可夫模型在数学上是不一致的。最突出的例子是一般时间可逆模型(GTR)及其部分但并非全部子模型。为了纠正这一缺陷,非齐次李马尔可夫模型已被确定为一类在DNA替换过程不断变化的情况下,无论分类群抽样如何都保持一致的模型。一些常用的知名模型属于这一类,但要么过于简单(例如木村二参数模型),要么过于复杂(一般马尔可夫模型)。在一系列不同的生物数据集上,我们测试了一系列涵盖参数丰富度全范围的李马尔可夫模型。与一直流行的GTR模型的基准相比,我们发现总体而言李马尔可夫模型表现良好,表现最佳的模型有8 - 10个参数,并且能够识别嘌呤和嘧啶之间的区别。