Fleischauer Markus, Böcker Sebastian
Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany.
PeerJ. 2018 Jun 8;6:e4987. doi: 10.7717/peerj.4987. eCollection 2018.
Supertree methods enable the reconstruction of large phylogenies. The supertree problem can be formalized in different ways in order to cope with contradictory information in the input. Some supertree methods are based on encoding the input trees in a matrix; other methods try to find minimum cuts in some graph. Recently, we introduced Bad Clade Deletion (BCD) supertrees which combines the graph-based computation of minimum cuts with optimizing a global objective function on the matrix representation of the input trees. The BCD supertree method has guaranteed polynomial running time and is very swift in practice. The quality of reconstructed supertrees was superior to matrix representation with parsimony (MRP) and usually on par with SuperFine for simulated data; but particularly for biological data, quality of BCD supertrees could not keep up with SuperFine supertrees. Here, we present a beam search extension for the BCD algorithm that keeps alive a constant number of partial solutions in each top-down iteration phase. The guaranteed worst-case running time of the new algorithm is still polynomial in the size of the input. We present an exact and a randomized subroutine to generate suboptimal partial solutions. Both beam search approaches consistently improve supertree quality on all evaluated datasets when keeping 25 suboptimal solutions alive. Supertree quality of the BCD Beam Search algorithm is on par with MRP and SuperFine even for biological data. This is the best performance of a polynomial-time supertree algorithm reported so far.
超树方法能够重建大型系统发育树。为了处理输入中的矛盾信息,超树问题可以用不同的方式进行形式化。一些超树方法基于将输入树编码到矩阵中;其他方法则试图在某个图中找到最小割。最近,我们引入了坏分支删除(BCD)超树,它将基于图的最小割计算与在输入树的矩阵表示上优化全局目标函数相结合。BCD超树方法具有多项式运行时间保证,并且在实际应用中非常迅速。对于模拟数据,重建超树的质量优于简约矩阵表示(MRP),并且通常与SuperFine相当;但特别是对于生物数据,BCD超树的质量无法与SuperFine超树相比。在这里,我们提出了一种针对BCD算法的束搜索扩展,在每个自上而下的迭代阶段保留固定数量的部分解。新算法保证的最坏情况运行时间在输入大小上仍然是多项式的。我们提出了一个精确的和一个随机子程序来生成次优部分解。当保留25个次优解时,两种束搜索方法在所有评估数据集上都持续提高了超树质量。即使对于生物数据,BCD束搜索算法的超树质量也与MRP和SuperFine相当。这是迄今为止报道的多项式时间超树算法的最佳性能。