Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87131, USA.
Bioinformatics. 2022 Jul 11;38(14):3565-3573. doi: 10.1093/bioinformatics/btac349.
Consensus methods can be used for reconstructing a species tree from several gene trees, which exhibit incompatible topologies due to incomplete lineage sorting. Motivated by the fact that there are no anomalous rooted gene trees with three taxa and no anomalous unrooted gene trees with four taxa in the multispecies coalescent model, several contemporary methods form the gene tree consensus by finding the median tree with respect to the triplet or quartet distance-i.e. estimate the species tree as the tree which minimizes the sum of triplet or quartet distances to the input gene trees. These methods reformulate the solution to the consensus problem as the solution to a recursively solved dynamic programming (DP) problem. We present an iterative, easily parallelizable approach to finding the exact median triplet tree and implement it as an open source software package that can also find suboptimal consensus trees within a specified triplet distance to the gene trees. The most time-consuming step for methods of this type is the creation of a weights array for all possible subtree bipartitions. By grouping the relevant calculations and array update operations of different bipartitions of the same subtree together, this implementation finds the exact median tree of many gene trees faster than comparable methods, has better scaling properties with respect to the number of gene trees and has a smaller memory footprint.
RTIST (Rooted Triple Inference of Species Trees) finds the exact median triplet tree of a set of gene trees. Its runtime and memory footprints scale better than existing algorithms. RTIST can resolve all the non-unique median trees, as well as sub-optimal consensus trees within a user-specified triplet distance to the median. Although it is limited in the number of taxa (≤20), its runtime changes little when the number of gene trees is changed by several orders of magnitude.
RTIST is written in C and Python. It is freely available at https://github.com/glebzhelezov/rtist.
共识方法可用于从几个基因树重建物种树,由于不完全谱系排序,这些基因树表现出不相容的拓扑结构。受多物种合并模型中不存在具有三个分类群的异常有根基因树和不存在具有四个分类群的异常无根基因树的事实的启发,几种当代方法通过找到三重或四重距离的中位数树来形成基因树共识,即估计物种树作为最小化到输入基因树的三重或四重距离之和的树。这些方法将共识问题的解决方案重新表述为递归求解动态规划(DP)问题的解决方案。我们提出了一种迭代的、易于并行化的方法来找到精确的中位数三重树,并将其实现为一个开源软件包,该软件包还可以在指定的三重距离内找到基因树的次优共识树。这种类型的方法最耗时的步骤是为所有可能的子树二分法创建权重数组。通过将同一子树的不同二分法的相关计算和数组更新操作分组在一起,该实现可以比类似的方法更快地找到许多基因树的精确中位数树,并且在基因树数量方面具有更好的缩放属性,并且具有较小的内存占用。
RTIST(有根三重物种树推断)找到一组基因树的精确中位数三重树。它的运行时和内存足迹比现有算法更好。RTIST 可以解析所有非唯一中位数树,以及用户指定的中位数三重距离内的次优共识树。尽管它受到分类群数量(≤20)的限制,但当基因树数量变化几个数量级时,其运行时间变化很小。
RTIST 用 C 和 Python 编写。它可在 https://github.com/glebzhelezov/rtist 上免费获得。