Department of Pathology, University of Michigan, 4237 Medical Science I, Ann Arbor, MI 48109, USA.
Comput Biol Chem. 2010 Jun;34(3):149-57. doi: 10.1016/j.compbiolchem.2010.05.001. Epub 2010 May 11.
The problem of finding the longest common subsequence (LCS) for an arbitrary number of sequences is a very interesting and challenging problem in computer science. This problem is NP-complete, but because of its importance, many heuristic algorithms have been proposed, such as Long Run, Expansion Algorithm and THSB. However, the performance, either in result quality or in process time, of many current heuristic algorithms deteriorates fast when the number of sequences and sequence length increase. In this paper, we have proposed a post-process heuristic algorithm for the LCS problem, the Deposition and Extension Algorithm (DEA). This algorithm first generates common subsequence by "sequence deposition" based on fine tuning of search range, and then extends this common subsequence. The algorithm is proven to generate Common Subsequences (CSs) with guaranteed lengths. The experiments on different dataset showed that the results of DEA algorithm were better than those of Long Run and Expansion Algorithm, especially on many long sequences. The algorithm also had superior efficiency both in time and memory space.
寻找任意数量序列的最长公共子序列(LCS)的问题是计算机科学中一个非常有趣和具有挑战性的问题。这个问题是 NP 完全的,但是由于它的重要性,已经提出了许多启发式算法,例如 Long Run、Expansion Algorithm 和 THSB。然而,当序列数量和序列长度增加时,许多当前启发式算法的性能,无论是在结果质量还是在处理时间方面,都会迅速恶化。在本文中,我们提出了一种针对 LCS 问题的后处理启发式算法,即沉积和扩展算法(DEA)。该算法首先通过基于搜索范围的微调的“序列沉积”生成公共子序列,然后扩展此公共子序列。该算法被证明可以生成具有保证长度的公共子序列(CSs)。在不同数据集上的实验表明,DEA 算法的结果优于 Long Run 和 Expansion Algorithm,尤其是在许多长序列上。该算法在时间和内存空间方面也具有较高的效率。