Lunter G A, Miklós I, Song Y S, Hein J
Department of Statistics, University of Oxford, Oxford, OX1 3TG, UK.
J Comput Biol. 2003;10(6):869-89. doi: 10.1089/106652703322756122.
We present an efficient algorithm for statistical multiple alignment based on the TKF91 model of Thorne, Kishino, and Felsenstein (1991) on an arbitrary k-leaved phylogenetic tree. The existing algorithms use a hidden Markov model approach, which requires at least O( radical 5(k)) states and leads to a time complexity of O(5(k)L(k)), where L is the geometric mean sequence length. Using a combinatorial technique reminiscent of inclusion/exclusion, we are able to sum away the states, thus improving the time complexity to O(2(k)L(k)) and considerably reducing memory requirements. This makes statistical multiple alignment under the TKF91 model a definite practical possibility in the case of a phylogenetic tree with a modest number of leaves.
我们提出了一种基于索恩、岸野和费尔斯滕森(1991)的TKF91模型,用于在任意k叶系统发育树上进行统计多重比对的高效算法。现有的算法采用隐马尔可夫模型方法,该方法至少需要O(√5(k))个状态,并且导致时间复杂度为O(5(k)L(k)),其中L是几何平均序列长度。通过使用一种让人联想到容斥原理的组合技术,我们能够消除这些状态,从而将时间复杂度提高到O(2(k)L(k)),并显著降低内存需求。这使得在叶数适中的系统发育树下,基于TKF91模型的统计多重比对成为一种切实可行的可能性。