沉积和扩展方法来寻找数千个长序列的最长公共子序列。

Deposition and extension approach to find longest common subsequence for thousands of long sequences.

机构信息

Department of Pathology, University of Michigan, 4237 Medical Science I, Ann Arbor, MI 48109, USA.

出版信息

Comput Biol Chem. 2010 Jun;34(3):149-57. doi: 10.1016/j.compbiolchem.2010.05.001. Epub 2010 May 11.

DOI:10.1016/j.compbiolchem.2010.05.001

PMID:20570215

Abstract

The problem of finding the longest common subsequence (LCS) for an arbitrary number of sequences is a very interesting and challenging problem in computer science. This problem is NP-complete, but because of its importance, many heuristic algorithms have been proposed, such as Long Run, Expansion Algorithm and THSB. However, the performance, either in result quality or in process time, of many current heuristic algorithms deteriorates fast when the number of sequences and sequence length increase. In this paper, we have proposed a post-process heuristic algorithm for the LCS problem, the Deposition and Extension Algorithm (DEA). This algorithm first generates common subsequence by "sequence deposition" based on fine tuning of search range, and then extends this common subsequence. The algorithm is proven to generate Common Subsequences (CSs) with guaranteed lengths. The experiments on different dataset showed that the results of DEA algorithm were better than those of Long Run and Expansion Algorithm, especially on many long sequences. The algorithm also had superior efficiency both in time and memory space.

摘要

寻找任意数量序列的最长公共子序列（LCS）的问题是计算机科学中一个非常有趣和具有挑战性的问题。这个问题是 NP 完全的，但是由于它的重要性，已经提出了许多启发式算法，例如 Long Run、Expansion Algorithm 和 THSB。然而，当序列数量和序列长度增加时，许多当前启发式算法的性能，无论是在结果质量还是在处理时间方面，都会迅速恶化。在本文中，我们提出了一种针对 LCS 问题的后处理启发式算法，即沉积和扩展算法（DEA）。该算法首先通过基于搜索范围的微调的“序列沉积”生成公共子序列，然后扩展此公共子序列。该算法被证明可以生成具有保证长度的公共子序列（CSs）。在不同数据集上的实验表明，DEA 算法的结果优于 Long Run 和 Expansion Algorithm，尤其是在许多长序列上。该算法在时间和内存空间方面也具有较高的效率。

相似文献

Deposition and extension approach to find longest common subsequence for thousands of long sequences.

Comput Biol Chem. 2010 Jun;34(3):149-57. doi: 10.1016/j.compbiolchem.2010.05.001. Epub 2010 May 11.

Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S12. doi: 10.1186/1471-2105-7-S4-S12.

A hyper-heuristic for the Longest Common Subsequence problem.

Comput Biol Chem. 2012 Feb;36:42-54. doi: 10.1016/j.compbiolchem.2011.12.004. Epub 2011 Dec 30.

CLAGen: a tool for clustering and annotating gene sequences using a suffix tree algorithm.

Biosystems. 2006 Jun;84(3):175-82. doi: 10.1016/j.biosystems.2005.11.001. Epub 2005 Dec 27.

A space-efficient algorithm for the constrained pairwise sequence alignment problem.

Genome Inform. 2005;16(2):237-46.

Comput Biol Chem. 2010 Apr;34(2):131-6. doi: 10.1016/j.compbiolchem.2010.03.007. Epub 2010 Apr 4.

An OpenMP-based tool for finding longest common subsequence in bioinformatics.

BMC Res Notes. 2019 Apr 11;12(1):220. doi: 10.1186/s13104-019-4256-6.

Multiple sequence alignment algorithm based on a dispersion graph and ant colony algorithm.

J Comput Chem. 2009 Oct;30(13):2031-8. doi: 10.1002/jcc.21203.

Fast, optimal alignment of three sequences using linear gap costs.

J Theor Biol. 2000 Dec 7;207(3):325-36. doi: 10.1006/jtbi.2000.2177.

Longest common substring in Longest Common Subsequence's solution service: A novel hyper-heuristic.

Comput Biol Chem. 2023 Aug;105:107882. doi: 10.1016/j.compbiolchem.2023.107882. Epub 2023 May 19.

引用本文的文献

New Construction of Family of MLCS Algorithms.

J Healthc Eng. 2021 Jan 19;2021:6636710. doi: 10.1155/2021/6636710. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

沉积和扩展方法来寻找数千个长序列的最长公共子序列。

Deposition and extension approach to find longest common subsequence for thousands of long sequences.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献