Bilu Yonatan, Agarwal Pankaj K, Kolodny Rachel
Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
IEEE/ACM Trans Comput Biol Bioinform. 2006 Oct-Dec;3(4):408-22. doi: 10.1109/TCBB.2006.53.
Multiple Sequence Alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider a version of the MSA problem where the goal is to find an optimal alignment in which matches are restricted to positions in predefined matching segments. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. We prove that it suffices to find an optimal alignment of the predefined sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify "shortcuts" that expedite the dynamic programming scheme. Empirical study shows that, taken together, these observations lead to an improved running time over the basic dynamic programming algorithm by 4 to 12 orders of magnitude, while still obtaining an optimal solution. Under the additional assumption that matches between segments are transitive, we further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm.
多序列比对(MSA)是计算分子生物学中最基本的问题之一。基于动态规划的寻找最优比对的最知名方案的运行时间会随着输入序列数量呈指数增长。因此,针对该问题提出了许多启发式方法。我们考虑多序列比对问题的一个版本,其目标是找到一个最优比对,其中匹配被限制在预定义匹配片段中的位置。我们提出了几种使动态规划算法更高效的技术,同时在这些限制条件下仍能找到最优解。我们证明,找到预定义序列片段而非单个字母的最优比对就足够了,从而减小了输入规模,进而提高了运行时间。我们还识别出了能加快动态规划方案的“捷径”。实证研究表明,综合起来,这些发现使运行时间相对于基本动态规划算法提高了4到12个数量级,同时仍能获得最优解。在片段间匹配具有传递性的额外假设下,我们通过限制动态规划算法的搜索空间进一步提高了寻找最优解的运行时间。