Le Thien, Sy Aaron, Molloy Erin K, Zhang Qiuyi, Rao Satish, Warnow Tandy
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):2-15. doi: 10.1109/TCBB.2020.2990867. Epub 2021 Feb 3.
Incremental tree building (INC) is a new phylogeny estimation method that has been proven to be absolute fast converging under standard sequence evolution models. A variant of INC, called Constrained-INC, is designed for use in divide-and-conquer pipelines for phylogeny estimation where a set of species is divided into disjoint subsets, trees are computed on the subsets using a selected base method, and then the subset trees are combined together. We evaluate the accuracy of INC and Constrained-INC for gene tree and species tree estimation on simulated datasets, and compare it to similar pipelines using NJMerge (another method that merges disjoint trees). For gene tree estimation, we find that INC has very poor accuracy in comparison to standard methods, and even Constrained-INC(using maximum likelihood methods to compute constraint trees) does not match the accuracy of the better maximum likelihood methods. Results for species trees are somewhat different, with Constrained-INC coming close to the accuracy of the best species tree estimation methods, while being much faster; furthermore, using Constrained-INC allows species tree estimation methods to scale to large datasets within limited computational resources. Overall, this study exposes the benefits and limitations of divide-and-conquer strategies for large-scale phylogenetic tree estimation.
增量树构建(INC)是一种新的系统发育估计方法,已被证明在标准序列进化模型下绝对快速收敛。INC的一个变体,称为约束INC,设计用于系统发育估计的分治管道,其中一组物种被划分为不相交的子集,使用选定的基本方法在子集上计算树,然后将子集树组合在一起。我们在模拟数据集上评估了INC和约束INC在基因树和物种树估计方面的准确性,并将其与使用NJMerge(另一种合并不相交树的方法)的类似管道进行了比较。对于基因树估计,我们发现与标准方法相比,INC的准确性非常差,甚至约束INC(使用最大似然方法计算约束树)也无法与更好的最大似然方法的准确性相匹配。物种树的结果有所不同,约束INC接近最佳物种树估计方法的准确性,同时速度要快得多;此外,使用约束INC允许物种树估计方法在有限的计算资源内扩展到大型数据集。总体而言,这项研究揭示了分治策略在大规模系统发育树估计中的优点和局限性。