Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
Department of Computer Science, The University of Texas at Austin, Texas, 78712, USA.
BMC Genomics. 2020 Feb 10;21(1):136. doi: 10.1186/s12864-020-6519-y.
Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, estimating a species tree from a collection of gene trees can be complicated due to the presence of gene tree incongruence resulting from incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent process. Maximum likelihood and Bayesian MCMC methods can potentially result in accurate trees, but they do not scale well to large datasets.
We present STELAR (Species Tree Estimation by maximizing tripLet AgReement), a new fast and highly accurate statistically consistent coalescent-based method for estimating species trees from a collection of gene trees. We formalized the constrained triplet consensus (CTC) problem and showed that the solution to the CTC problem is a statistically consistent estimate of the species tree under the multi-species coalescent (MSC) model. STELAR is an efficient dynamic programming based solution to the CTC problem which is highly accurate and scalable. We evaluated the accuracy of STELAR in comparison with SuperTriplets, which is an alternate fast and highly accurate triplet-based supertree method, and with MP-EST and ASTRAL - two of the most popular and accurate coalescent-based methods. Experimental results suggest that STELAR matches the accuracy of ASTRAL and improves on MP-EST and SuperTriplets.
Theoretical and empirical results (on both simulated and real biological datasets) suggest that STELAR is a valuable technique for species tree estimation from gene tree distributions.
物种树估计通常基于使用整个基因组中多个基因的系统基因组学方法。然而,由于不完全谱系分选(ILS)导致的基因树不一致的存在,从基因树集合中估计物种树可能会变得复杂,ILS 通过多物种合并过程进行建模。最大似然和贝叶斯 MCMC 方法可能会产生准确的树,但它们不适用于大型数据集。
我们提出了 STELAR(通过最大化三联体一致性估计物种树),这是一种新的快速且高度准确的基于合并的方法,用于从基因树集合中估计物种树。我们形式化了约束三联体共识(CTC)问题,并表明 CTC 问题的解决方案是多物种合并(MSC)模型下物种树的统计一致估计。STELAR 是 CTC 问题的一种高效的动态规划解决方案,具有高度准确性和可扩展性。我们将 STELAR 的准确性与 SuperTriplets 进行了比较,SuperTriplets 是一种替代的快速且高度准确的基于三联体的超树方法,以及与 MP-EST 和 ASTRAL - 两种最流行和准确的基于合并的方法进行了比较。实验结果表明,STELAR 与 ASTRAL 的准确性相匹配,并优于 MP-EST 和 SuperTriplets。
理论和经验结果(包括模拟和真实生物数据集)表明,STELAR 是从基因树分布估计物种树的一种有价值的技术。