Boucher Christina, Cenzato Davide, Lipták Zsuzsanna, Rossi Massimiliano, Sciortino Marinella
Dept of Computer and Information Science and Engineering, University of Florida.
Dept of Computer Science, University of Verona, Italy.
Int Symp String Process Inf Retr. 2021 Oct;12944:3-12. doi: 10.1007/978-3-030-86692-1_1. Epub 2021 Sep 27.
The extended Burrows Wheeler Transform (eBWT) was introduced by Mantaci et al. [TCS 2007] to extend the definition of the BWT to a collection of strings. In our prior work [SPIRE 2021], we give a linear-time algorithm for the eBWT that preserves the fundamental property of the original definition (i.e., the independence from the input order). The algorithm combines a modification of the Suffix Array Induced Sorting (SAIS) algorithm [IEEE Trans Comput 2011] with Prefix Free Parsing [AMB 2019; JCB 2020]. In this paper, we show how this construction algorithm leads to -indexing the eBWT, i.e., run-length encoded eBWT and SA samples of Gagie et al. [SODA 2018] can be constructed efficiently from the components of the PFP. Moreover, we show that finding maximal exact matches (MEMs) between a query string and the -index of the eBWT can be efficiently supported.
扩展的Burrows Wheeler变换(eBWT)由曼塔奇等人于2007年在《理论计算机科学》中提出,用于将BWT的定义扩展到字符串集合。在我们之前的工作[《SPIRE 2021》]中,我们给出了一种针对eBWT的线性时间算法,该算法保留了原始定义的基本属性(即与输入顺序无关)。该算法将后缀数组诱导排序(SAIS)算法[《IEEE计算机汇刊》2011年]的一种改进与无前缀解析[《AMB 2019》;《JCB 2020》]相结合。在本文中,我们展示了这种构造算法如何导致对eBWT进行索引,即可以从PFP的组件高效地构建游程编码的eBWT和加吉等人[《SODA 2018》]的SA样本。此外,我们表明可以有效地支持在查询字符串与eBWT的索引之间查找最大精确匹配(MEMs)。