Bioinformatics and Genomics Group, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Catalonia, Spain.
Mol Biol Evol. 2012 Mar;29(3):929-37. doi: 10.1093/molbev/msr259. Epub 2011 Oct 17.
In phylogenetic inference, an evolutionary model describes the substitution processes along each edge of a phylogenetic tree. Misspecification of the model has important implications for the analysis of phylogenetic data. Conventionally, however, the selection of a suitable evolutionary model is based on heuristics or relies on the choice of an approximate input tree. We introduce a method for model Selection in Phylogenetics based on linear INvariants (SPIn), which uses recent insights on linear invariants to characterize a model of nucleotide evolution for phylogenetic mixtures on any number of components. Linear invariants are constraints among the joint probabilities of the bases in the operational taxonomic units that hold irrespective of the tree topologies appearing in the mixtures. SPIn therefore requires no input tree and is designed to deal with nonhomogeneous phylogenetic data consisting of multiple sequence alignments showing different patterns of evolution, for example, concatenated genes, exons, and/or introns. Here, we report on the results of the proposed method evaluated on multiple sequence alignments simulated under a variety of single-tree and mixture settings for both continuous- and discrete-time models. In the simulations, SPIn successfully recovers the underlying evolutionary model and is shown to perform better than existing approaches.
在系统发育推断中,进化模型描述了系统发育树中每条边的替代过程。模型的不正确指定对系统发育数据分析有重要影响。然而,传统上,选择合适的进化模型是基于启发式的,或者依赖于近似输入树的选择。我们引入了一种基于线性不变量的系统发育模型选择方法(SPIn),该方法利用线性不变量的最新见解来描述任何数量成分的系统发育混合物中核苷酸进化的模型。线性不变量是操作分类单位中碱基联合概率之间的约束,这些约束与混合物中出现的树拓扑结构无关。因此,SPIn 不需要输入树,旨在处理由多个序列比对组成的非均匀系统发育数据,这些序列比对显示出不同的进化模式,例如,串联基因、外显子和/或内含子。在这里,我们报告了在多种单树和混合物设置下,对连续时间和离散时间模型下模拟的多序列比对进行评估的建议方法的结果。在模拟中,SPIn 成功地恢复了潜在的进化模型,并被证明比现有方法表现更好。