使用笹钟根压缩后缀树解决所有后缀对前缀问题。

Using the Sadakane compressed suffix tree to solve the all-pairs suffix-prefix problem.

作者信息

Rachid Maan Haj, Malluhi Qutaibah, Abouelhoda Mohamed

机构信息

KINDI Lab for Computing Research, Qatar University P.O. Box 2713, Doha, Qatar.

Faculty of Engineering, Cairo University, Giza, Egypt ; Center for Informatics Sciences, Nile University, Giza, Egypt.

出版信息

Biomed Res Int. 2014;2014:745298. doi: 10.1155/2014/745298. Epub 2014 Apr 16.

DOI:10.1155/2014/745298

PMID:24834435

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4009283/

Abstract

The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane compressed suffix tree. Furthermore, we present a parallel algorithm to provide more speed for shared memory computers. Our sequential and parallel algorithms are optimized by exploiting features of the Sadakane compressed index data structure. Experimental results show that our solution based on the Sadakane's compressed index consumes significantly less space than the ones based on noncompressed data structures like the suffix tree and the enhanced suffix array. Our experimental results show that our parallel algorithm is efficient and scales well with increasing number of processors.

摘要

全对后缀-前缀匹配问题是字符串处理中的一个基本问题。它在从头基因组组装任务中有应用，而从头基因组组装任务是主要的生物信息学问题之一。由于输入数据规模巨大，使用快速且节省空间的解决方案至关重要。在本文中，我们使用广义的笹兼压缩后缀树提出了一种针对此问题的节省空间的解决方案。此外，我们提出了一种并行算法，为共享内存计算机提供更高的速度。我们的顺序和并行算法通过利用笹兼压缩索引数据结构的特性进行了优化。实验结果表明，我们基于笹兼压缩索引的解决方案比基于后缀树和增强后缀数组等非压缩数据结构的解决方案消耗的空间要少得多。我们的实验结果表明，我们的并行算法效率高，并且随着处理器数量的增加扩展性良好。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5417/4009283/83a2d3d98dbc/BMRI2014-745298.002.jpg

相似文献

Using the Sadakane compressed suffix tree to solve the all-pairs suffix-prefix problem.

Biomed Res Int. 2014;2014:745298. doi: 10.1155/2014/745298. Epub 2014 Apr 16.

Breaking the -Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees.

Proc Annu ACM SIAM Symp Discret Algorithms. 2023;2023:5122-5202. doi: 10.1137/1.9781611977554.ch187.

Compressed suffix tree--a basis for genome-scale sequence analysis.

Bioinformatics. 2007 Mar 1;23(5):629-30. doi: 10.1093/bioinformatics/btl681. Epub 2007 Jan 19.

Indexing huge genome sequences for solving various problems.

Genome Inform. 2001;12:175-83.

Computing matching statistics on Wheeler DFAs.

Proc Data Compress Conf. 2023 Mar;2023:150-159. doi: 10.1109/dcc55655.2023.00023. Epub 2023 May 19.

A Practical and Scalable Tool to Find Overlaps between Sequences.

Biomed Res Int. 2015;2015:905261. doi: 10.1155/2015/905261. Epub 2015 Apr 19.

gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections.

Algorithms Mol Biol. 2020 Sep 22;15:18. doi: 10.1186/s13015-020-00177-y. eCollection 2020.

Suffix sorting via matching statistics.

Algorithms Mol Biol. 2024 Mar 12;19(1):11. doi: 10.1186/s13015-023-00245-z.

Generalized enhanced suffix array construction in external memory.

Algorithms Mol Biol. 2017 Dec 7;12:26. doi: 10.1186/s13015-017-0117-9. eCollection 2017.

Sorting permutations by prefix and suffix rearrangements.

J Bioinform Comput Biol. 2017 Feb;15(1):1750002. doi: 10.1142/S0219720017500020. Epub 2017 Feb 9.

引用本文的文献

Two Efficient Techniques to Find Approximate Overlaps between Sequences.

Biomed Res Int. 2017;2017:2731385. doi: 10.1155/2017/2731385. Epub 2017 Feb 15.

A Practical and Scalable Tool to Find Overlaps between Sequences.

Biomed Res Int. 2015;2015:905261. doi: 10.1155/2015/905261. Epub 2015 Apr 19.

本文引用的文献

Efficient de novo assembly of large genomes using compressed data structures.

Genome Res. 2012 Mar;22(3):549-56. doi: 10.1101/gr.126953.111. Epub 2011 Dec 7.

Compressed suffix tree--a basis for genome-scale sequence analysis.

Bioinformatics. 2007 Mar 1;23(5):629-30. doi: 10.1093/bioinformatics/btl681. Epub 2007 Jan 19.

The fragment assembly string graph.

Bioinformatics. 2005 Sep 1;21 Suppl 2:ii79-85. doi: 10.1093/bioinformatics/bti1114.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用笹钟根压缩后缀树解决所有后缀对前缀问题。

Using the Sadakane compressed suffix tree to solve the all-pairs suffix-prefix problem.

作者信息

Rachid Maan Haj, Malluhi Qutaibah, Abouelhoda Mohamed

机构信息

KINDI Lab for Computing Research, Qatar University P.O. Box 2713, Doha, Qatar.

Faculty of Engineering, Cairo University, Giza, Egypt ; Center for Informatics Sciences, Nile University, Giza, Egypt.

出版信息

Biomed Res Int. 2014;2014:745298. doi: 10.1155/2014/745298. Epub 2014 Apr 16.

DOI:10.1155/2014/745298

PMID:24834435

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4009283/

Abstract

摘要

使用笹钟根压缩后缀树解决所有后缀对前缀问题。

Using the Sadakane compressed suffix tree to solve the all-pairs suffix-prefix problem.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用笹钟根压缩后缀树解决所有后缀对前缀问题。

Using the Sadakane compressed suffix tree to solve the all-pairs suffix-prefix problem.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献