Whidden Chris, Matsen Frederick A
IEEE/ACM Trans Comput Biol Bioinform. 2019 May-Jun;16(3):898-911. doi: 10.1109/TCBB.2018.2802911.
The subtree prune-and-regraft (SPR) distance metric is a fundamental way of comparing evolutionary trees. It has wide-ranging applications, such as to study lateral genetic transfer, viral recombination, and Markov chain Monte Carlo phylogenetic inference. Although the rooted version of SPR distance can be computed relatively efficiently between rooted trees using fixed-parameter-tractable maximum agreement forest (MAF) algorithms, no MAF formulation is known for the unrooted case. Correspondingly, previous algorithms are unable to compute unrooted SPR distances larger than 7. In this paper, we substantially advance understanding of and computational algorithms for the unrooted SPR distance. First, we identify four properties of optimal SPR paths, each of which suggests that no MAF formulation exists in the unrooted case. Then, we introduce the replug distance, a new lower bound on the unrooted SPR distance that is amenable to MAF methods, and give an efficient fixed-parameter algorithm for calculating it. Finally, we develop a "progressive A*" search algorithm using multiple heuristics, including the TBR and replug distances, to exactly compute the unrooted SPR distance. Our algorithm is nearly two orders of magnitude faster than previous methods on small trees, and allows computation of unrooted SPR distances as large as 14 on trees with 50 leaves.
子树剪枝与重嫁接(SPR)距离度量是比较进化树的一种基本方法。它有广泛的应用,比如用于研究横向基因转移、病毒重组以及马尔可夫链蒙特卡罗系统发育推断。尽管使用固定参数可处理的最大一致森林(MAF)算法,有根SPR距离的有根版本在有根树之间可以相对高效地计算,但对于无根情况,尚无已知的MAF公式。相应地,以前的算法无法计算大于7的无根SPR距离。在本文中,我们极大地推进了对无根SPR距离的理解以及相关计算算法。首先,我们确定了最优SPR路径的四个属性,每个属性都表明在无根情况下不存在MAF公式。然后,我们引入了重新插入距离,这是一种适用于MAF方法的无根SPR距离的新下界,并给出了一个计算它的高效固定参数算法。最后,我们开发了一种使用多种启发式方法(包括TBR和重新插入距离)的“渐进A*”搜索算法,以精确计算无根SPR距离。我们的算法在小树方面比以前的方法快近两个数量级,并且能够计算具有50个叶子的树上高达14的无根SPR距离。