迈向最短公共超序列问题的更好解决方案：沉积与归约算法。

Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm.

作者信息

Ning Kang, Leong Hon Wai

机构信息

Department of Computer Science, National University of Singapore, Science Drive, Singapore 117543, Singapore.

出版信息

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S12. doi: 10.1186/1471-2105-7-S4-S12.

DOI:10.1186/1471-2105-7-S4-S12

PMID:17217504

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1780115/

Abstract

BACKGROUND

The problem of finding a Shortest Common Supersequence (SCS) of a set of sequences is an important problem with applications in many areas. It is a key problem in biological sequences analysis. The SCS problem is well-known to be NP-complete. Many heuristic algorithms have been proposed. Some heuristics work well on a few long sequences (as in sequence comparison applications); others work well on many short sequences (as in oligo-array synthesis). Unfortunately, most do not work well on large SCS instances where there are many, long sequences.

RESULTS

In this paper, we present a Deposition and Reduction (DR) algorithm for solving large SCS instances of biological sequences. There are two processes in our DR algorithm: deposition process, and reduction process. The deposition process is responsible for generating a small set of common supersequences; and the reduction process shortens these common supersequences by removing some characters while preserving the common supersequence property. Our evaluation on simulated data and real DNA and protein sequences show that our algorithm consistently produces the best results compared to many well-known heuristic algorithms, and especially on large instances.

CONCLUSION

Our DR algorithm provides a partial answer to the open problem of designing efficient heuristic algorithm for SCS problem on many long sequences. Our algorithm has a bounded approximation ratio. The algorithm is efficient, both in running time and space complexity and our evaluation shows that it is practical even for SCS problems on many long sequences.

摘要

背景

寻找一组序列的最短公共超序列（SCS）问题是一个在许多领域都有应用的重要问题。它是生物序列分析中的关键问题。众所周知，SCS问题是NP完全问题。已经提出了许多启发式算法。一些启发式算法在少数长序列上效果良好（如在序列比较应用中）；其他算法在许多短序列上效果良好（如在寡核苷酸阵列合成中）。不幸的是，大多数算法在存在许多长序列的大型SCS实例上效果不佳。

结果

在本文中，我们提出了一种用于解决生物序列大型SCS实例的沉积与约简（DR）算法。我们的DR算法有两个过程：沉积过程和约简过程。沉积过程负责生成一小集合的公共超序列；约简过程通过去除一些字符同时保留公共超序列属性来缩短这些公共超序列。我们对模拟数据以及真实DNA和蛋白质序列的评估表明，与许多著名的启发式算法相比，我们的算法始终能产生最佳结果，特别是在大型实例上。

结论

我们的DR算法为设计针对许多长序列的SCS问题的高效启发式算法这一开放问题提供了部分答案。我们的算法具有有界近似比。该算法在运行时间和空间复杂度方面都很高效，并且我们的评估表明，即使对于许多长序列的SCS问题，它也是实用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/597e/1780115/dda611029e76/1471-2105-7-S4-S12-1.jpg

相似文献

Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S12. doi: 10.1186/1471-2105-7-S4-S12.

Coevolving solutions to the shortest common superstring problem.

Biosystems. 2004 Aug-Oct;76(1-3):209-16. doi: 10.1016/j.biosystems.2004.05.013.

Deposition and extension approach to find longest common subsequence for thousands of long sequences.

Comput Biol Chem. 2010 Jun;34(3):149-57. doi: 10.1016/j.compbiolchem.2010.05.001. Epub 2010 May 11.

Chemical reaction optimization for solving shortest common supersequence problem.

Comput Biol Chem. 2016 Oct;64:82-93. doi: 10.1016/j.compbiolchem.2016.05.004. Epub 2016 May 31.

An improved chemical reaction optimization algorithm for solving the shortest common supersequence problem.

Comput Biol Chem. 2020 Oct;88:107327. doi: 10.1016/j.compbiolchem.2020.107327. Epub 2020 Jul 3.

Residue-rotamer-reduction algorithm for the protein side-chain conformation problem.

Bioinformatics. 2006 Jan 15;22(2):188-94. doi: 10.1093/bioinformatics/bti763. Epub 2005 Nov 8.

PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences.

Bioinformatics. 2007 Feb 1;23(3):372-4. doi: 10.1093/bioinformatics/btl592. Epub 2006 Nov 21.

A space-efficient algorithm for the constrained pairwise sequence alignment problem.

Genome Inform. 2005;16(2):237-46.

Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost.

BMC Bioinformatics. 2006 Dec 1;7:524. doi: 10.1186/1471-2105-7-524.

An enhanced beam search algorithm for the Shortest Common Supersequence Problem.

Eng Appl Artif Intell. 2012 Apr;25(3):457-467. doi: 10.1016/j.engappai.2011.08.006. Epub 2011 Sep 20.

引用本文的文献

An Opposition-Based Learning CRO Algorithm for Solving the Shortest Common Supersequence Problem.

Entropy (Basel). 2022 May 3;24(5):641. doi: 10.3390/e24050641.

An enhanced beam search algorithm for the Shortest Common Supersequence Problem.

Eng Appl Artif Intell. 2012 Apr;25(3):457-467. doi: 10.1016/j.engappai.2011.08.006. Epub 2011 Sep 20.

A multilevel probabilistic beam search algorithm for the shortest common supersequence problem.

PLoS One. 2012;7(12):e52427. doi: 10.1371/journal.pone.0052427. Epub 2012 Dec 27.

Development of computations in bioscience and bioinformatics and its application: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB06).

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S1. doi: 10.1186/1471-2105-7-S4-S1.

本文引用的文献

A post-processing method for optimizing synthesis strategy for oligonucleotide microarrays.

Nucleic Acids Res. 2005 Sep 28;33(17):e144. doi: 10.1093/nar/gni147.

A computational framework for optimal masking in the synthesis of oligonucleotide microarrays.

Nucleic Acids Res. 2002 Oct 15;30(20):e106. doi: 10.1093/nar/gnf105.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

迈向最短公共超序列问题的更好解决方案：沉积与归约算法。

Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献