Center for Computational Medicine and Bioinformatics, University of Michigan, USA.
Mol Biol Evol. 2010 Mar;27(3):552-69. doi: 10.1093/molbev/msp250. Epub 2009 Oct 15.
Concatenated sequence alignments are often used to infer species-level relationships. Previous studies have shown that analysis of concatenated data using maximum likelihood (ML) can produce misleading results when loci have differing gene tree topologies due to incomplete lineage sorting. Here, we develop a polynomial time method that utilizes the modified mincut supertree algorithm to construct an estimated species tree from inferred rooted triples of concatenated alignments. We term this method SuperMatrix Rooted Triple (SMRT) and use the notation SMRT-ML when rooted triples are inferred by ML. We use simulations to investigate the performance of SMRT-ML under Jukes-Cantor and general time-reversible substitution models for four- and five-taxon species trees and also apply the method to an empirical data set of yeast genes. We find that SMRT-ML converges to the correct species tree in many cases in which ML on the full concatenated data set fails to do so. SMRT-ML can be conservative in that its output tree is often partially unresolved for problematic clades. We show analytically that when the species tree is clocklike and mutations occur under the Cavender-Farris-Neyman substitution model, as the number of genes increases, SMRT-ML is increasingly likely to infer the correct species tree even when the most likely gene tree does not match the species tree. SMRT-ML is therefore a computationally efficient and statistically consistent estimator of the species tree when gene trees are distributed according to the multispecies coalescent model.
串联序列比对常用于推断种间关系。先前的研究表明,由于不完全谱系分选,当基因树拓扑结构不同时,使用最大似然法(ML)对串联数据进行分析会产生误导性结果。在这里,我们开发了一种多项式时间方法,该方法利用改进的最小割超级树算法,从推断的串联比对的有根三联体构建估计的种系发生树。我们将这种方法称为 SuperMatrix Rooted Triple(SMRT),并在通过 ML 推断有根三联体时使用 SMRT-ML 表示法。我们使用模拟来研究 SMRT-ML 在 Jukes-Cantor 和广义时间可逆替代模型下对四联体和五联体种系发生树的性能,并且还将该方法应用于酵母基因的实证数据集。我们发现,在许多情况下,当对完整的串联数据集进行 ML 时无法正确推断出种系发生树,而 SMRT-ML 却可以正确推断出种系发生树。SMRT-ML 可能较为保守,因为对于有问题的分支,其输出树通常部分未解决。我们通过分析表明,当种系发生树为钟形且突变发生在 Cavender-Farris-Neyman 替代模型下时,随着基因数量的增加,即使最可能的基因树与种系发生树不匹配,SMRT-ML 也越来越有可能推断出正确的种系发生树。因此,当基因树根据多物种融合模型分布时,SMRT-ML 是一种计算效率高且统计一致的种系发生树估计量。