Truszkowski Jakub, Hao Yanqi, Brown Daniel G
David R, Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1 Canada.
Algorithms Mol Biol. 2012 Nov 26;7(1):32. doi: 10.1186/1748-7188-7-32.
: Recently, we have identified a randomized quartet phylogeny algorithm that has O(nlogn) runtime with high probability, which is asymptotically optimal. Our algorithm has high probability of returning the correct phylogeny when quartet errors are independent and occur with known probability, and when the algorithm uses a guide tree on O(loglogn) taxa that is correct with high probability. In practice, none of these assumptions is correct: quartet errors are positively correlated and occur with unknown probability, and the guide tree is often error prone. Here, we bring our work out of the purely theoretical setting. We present a variety of extensions which, while only slowing the algorithm down by a constant factor, make its performance nearly comparable to that of Neighbour Joining , which requires Θ(n3) runtime in existing implementations. Our results suggest a new direction for quartet-based phylogenetic reconstruction that may yield striking speed improvements at minimal accuracy cost. An early prototype implementation of our software is available at http://www.cs.uwaterloo.ca/jmtruszk/qtree.tar.gz.
最近,我们确定了一种随机四重奏系统发育算法,该算法以高概率具有O(nlogn)的运行时间,这在渐近意义上是最优的。当四重奏错误相互独立且以已知概率出现时,以及当算法在O(loglogn)个分类单元上使用具有高概率正确的引导树时,我们的算法有很高的概率返回正确的系统发育树。在实际中,这些假设都不正确:四重奏错误是正相关的,并且以未知概率出现,而且引导树往往容易出错。在此,我们将我们的工作从纯粹的理论环境中拓展出来。我们提出了多种扩展,虽然这些扩展只会使算法的运行速度减慢一个常数因子,但使其性能几乎可与邻接法相媲美,而在现有实现中邻接法需要Θ(n3)的运行时间。我们的结果为基于四重奏的系统发育重建提出了一个新方向,该方向可能以最小的准确性代价实现显著的速度提升。我们软件的早期原型实现可在http://www.cs.uwaterloo.ca/jmtruszk/qtree.tar.gz获取。