列排序：系统发育似然函数的快速计算

Column sorting: rapid calculation of the phylogenetic likelihood function.

作者信息

Kosakovsky Pond Sergei L, Muse Spencer V

机构信息

Department of Mathematics, University of Arizona, Tucson, Arizona 85271, USA.

出版信息

Syst Biol. 2004 Oct;53(5):685-92. doi: 10.1080/10635150490522269.

DOI:10.1080/10635150490522269

PMID:15545249

Abstract

Likelihood applications have become a central approach for molecular evolutionary analyses since the first computationally tractable treatment two decades ago. Although Felsenstein's original pruning algorithm makes likelihood calculations feasible, it is usually possible to take advantage of repetitive structure present in the data to arrive at even greater computational reductions. In particular, alignment columns with certain similarities have components of the likelihood calculation that are identical and need not be recomputed if columns are evaluated in an optimal order. We develop an algorithm for exploiting this speed improvement via an application of graph theory. The reductions provided by the method depend on both the tree and the data, but typical savings range between 15%and 50%. Real-data examples with time reductions of 80%have been identified. The overhead costs associated with implementing the algorithm are minimal, and they are recovered in all but the smallest data sets. The modifications will provide faster likelihood algorithms, which will allow likelihood methods to be applied to larger sets of taxa and to include more thorough searches of the tree topology space.

摘要

自二十年前首次实现可计算处理以来，似然法应用已成为分子进化分析的核心方法。尽管费尔斯滕森最初的剪枝算法使似然计算变得可行，但通常可以利用数据中存在的重复结构来进一步减少计算量。特别是，具有某些相似性的比对列在似然计算中有相同的部分，如果以最优顺序评估列，则无需重新计算。我们通过应用图论开发了一种利用这种速度提升的算法。该方法带来的计算量减少取决于树和数据，但通常节省幅度在15%到50%之间。已经发现了实际数据示例中时间减少80%的情况。实现该算法的额外成本极小，并且除了最小的数据集外，在所有数据集中都能得到弥补。这些改进将提供更快的似然算法，这将使似然法能够应用于更大的分类单元集，并能更全面地搜索树拓扑空间。