Suppr超能文献

对增强型Burrows-Wheeler变换进行r索引

r-indexing the eBWT.

作者信息

Boucher Christina, Cenzato Davide, Lipták Zsuzsanna, Rossi Massimiliano, Sciortino Marinella

机构信息

Dept of Computer and Information Science and Engineering, University of Florida.

Dept of Computer Science, University of Verona, Italy.

出版信息

Int Symp String Process Inf Retr. 2021 Oct;12944:3-12. doi: 10.1007/978-3-030-86692-1_1. Epub 2021 Sep 27.

Abstract

The extended Burrows Wheeler Transform (eBWT) was introduced by Mantaci et al. [TCS 2007] to extend the definition of the BWT to a collection of strings. In our prior work [SPIRE 2021], we give a linear-time algorithm for the eBWT that preserves the fundamental property of the original definition (i.e., the independence from the input order). The algorithm combines a modification of the Suffix Array Induced Sorting (SAIS) algorithm [IEEE Trans Comput 2011] with Prefix Free Parsing [AMB 2019; JCB 2020]. In this paper, we show how this construction algorithm leads to -indexing the eBWT, i.e., run-length encoded eBWT and SA samples of Gagie et al. [SODA 2018] can be constructed efficiently from the components of the PFP. Moreover, we show that finding maximal exact matches (MEMs) between a query string and the -index of the eBWT can be efficiently supported.

摘要

扩展的Burrows Wheeler变换(eBWT)由曼塔奇等人于2007年在《理论计算机科学》中提出,用于将BWT的定义扩展到字符串集合。在我们之前的工作[《SPIRE 2021》]中,我们给出了一种针对eBWT的线性时间算法,该算法保留了原始定义的基本属性(即与输入顺序无关)。该算法将后缀数组诱导排序(SAIS)算法[《IEEE计算机汇刊》2011年]的一种改进与无前缀解析[《AMB 2019》;《JCB 2020》]相结合。在本文中,我们展示了这种构造算法如何导致对eBWT进行索引,即可以从PFP的组件高效地构建游程编码的eBWT和加吉等人[《SODA 2018》]的SA样本。此外,我们表明可以有效地支持在查询字符串与eBWT的索引之间查找最大精确匹配(MEMs)。

相似文献

1
r-indexing the eBWT.对增强型Burrows-Wheeler变换进行r索引
Int Symp String Process Inf Retr. 2021 Oct;12944:3-12. doi: 10.1007/978-3-030-86692-1_1. Epub 2021 Sep 27.
2
Computing the original eBWT faster, simpler, and with less memory.更快、更简单且占用更少内存地计算原始增强型Burrows-Wheeler变换。
Int Symp String Process Inf Retr. 2021 Oct;12944:129-142. doi: 10.1007/978-3-030-86692-1_11. Epub 2021 Sep 27.
4
MONI: A Pangenomic Index for Finding Maximal Exact Matches.MONI:用于寻找最大精确匹配的泛基因组索引。
J Comput Biol. 2022 Feb;29(2):169-187. doi: 10.1089/cmb.2021.0290. Epub 2022 Jan 17.
5
PFP Compressed Suffix Trees.PFP压缩后缀树
Proc Worksh Algorithm Eng Exp. 2021;2021:60-72. doi: 10.1137/1.9781611976472.5.
10
Generalized enhanced suffix array construction in external memory.外部内存中的广义增强后缀数组构造
Algorithms Mol Biol. 2017 Dec 7;12:26. doi: 10.1186/s13015-017-0117-9. eCollection 2017.

引用本文的文献

本文引用的文献

1
Computing the original eBWT faster, simpler, and with less memory.更快、更简单且占用更少内存地计算原始增强型Burrows-Wheeler变换。
Int Symp String Process Inf Retr. 2021 Oct;12944:129-142. doi: 10.1007/978-3-030-86692-1_11. Epub 2021 Sep 27.
3
Prefix-free parsing for building big BWTs.用于构建大型Burrows-Wheeler变换(BWT)的无前缀解析
Algorithms Mol Biol. 2019 May 24;14:13. doi: 10.1186/s13015-019-0148-5. eCollection 2019.
5
RPAN: rice pan-genome browser for ∼3000 rice genomes.RPAN:用于约3000个水稻基因组的水稻泛基因组浏览器。
Nucleic Acids Res. 2017 Jan 25;45(2):597-605. doi: 10.1093/nar/gkw958. Epub 2016 Dec 10.
8
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验