Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
Evol Bioinform Online. 2013 Aug 13;9:317-25. doi: 10.4137/EBO.S12483. eCollection 2013.
Phylogenetic analysis based on multi-loci data sets is performed by means of supermatrix (SM) or supertree (ST) approaches. Recently, methods that rely on species tree (SppT) inference by the multi-species coalescence have also been implemented to tackle this problem. Generally, the relative performance of these three major strategies has been calculated using simulation of biological sequences. However, sequence simulation may not entirely replicate the complexity of the evolutionary process. Thus, issues regarding the usefulness of in silico sequences in studying the performance of phylogenetic methods have been raised. Here, we used both classical simulation and empirical data to investigate the relative performance of ST, SM, and the SppT methods. SM analyses performed better than the ST and SppTs in simulations, but not in empirical analyses where some ST methods significantly outperformed the others. Additionally, SM was the only method that was robust under evolutionary model violations in simulations. These results show that conventional biological sequence simulation cannot adequately resolve which method is most efficient to recover the SppT. In such simulations, the SM approach recovers the established phylogeny in most instances, whereas the performance of the ST and SppT methods is downgraded in simpler cases. When compared, the analyses based on empirical and simulated sequences yielded largely inconsistent results, with the latter showing a bias towards a seemingly superiority of SM approaches.
基于多基因数据集的系统发育分析是通过超矩阵(SM)或超树(ST)方法来实现的。最近,还实施了依赖多物种合并的种系树(SppT)推断的方法来解决这个问题。一般来说,这些三种主要策略的相对性能是通过生物序列的模拟来计算的。然而,序列模拟可能不完全复制进化过程的复杂性。因此,关于在研究系统发育方法性能时使用计算机序列的有用性的问题已经被提出。在这里,我们使用经典模拟和实证数据来研究 ST、SM 和 SppT 方法的相对性能。SM 分析在模拟中比 ST 和 SppT 表现更好,但在实证分析中,一些 ST 方法的表现明显优于其他方法。此外,SM 是唯一一种在模拟中对进化模型违反具有稳健性的方法。这些结果表明,传统的生物序列模拟不能充分解决哪种方法最有效地恢复 SppT 的问题。在这种模拟中,SM 方法在大多数情况下恢复了已建立的系统发育,而 ST 和 SppT 方法的性能在更简单的情况下会降低。相比之下,基于实证和模拟序列的分析产生了很大不一致的结果,后者似乎偏向于 SM 方法的优势。