Cotumaccio Nicola, Gagie Travis, Köppl Dominik, Prezza Nicola
GSSI, Italy.
Dalhousie University, Canada.
Int Symp String Process Inf Retr. 2023 Sep;14240:143-156. doi: 10.1007/978-3-031-43980-3_12. Epub 2023 Sep 20.
Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array requires bits, being the number of states, while the compact representation of Wheeler DFAs often requires much less space. In particular, the BOSS representation of a de Bruijn graph only requires a linear number of bits, if the size of alphabet is constant. In this paper, we propose a sampling technique that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits. We use our technique to provide a space-time tradeoff to compute matching statistics on a Wheeler DFA. In addition, we show that by augmenting the BOSS representation of a -th order de Bruijn graph with a linear number of bits we can navigate the underlying variable-order de Bruijn graph in time logarithmic in , thus improving a previous bound by Boucher et al. which was linear in [DCC 2015].
最近,孔特等人将最长公共前缀(LCP)数组从字符串推广到惠勒确定有限自动机(DFA),并且他们表明它可用于高效地确定惠勒DFA上的匹配统计信息[DCC 2023]。然而,存储LCP数组需要 位, 是状态数,而惠勒DFA的紧凑表示通常需要少得多的空间。特别地,如果字母表大小是常数,德布鲁因图的BOSS表示仅需要线性数量的位。在本文中,我们提出一种采样技术,该技术通过仅存储线性数量的位就能在对数时间内访问LCP数组的一个条目。我们使用我们的技术来提供一种时空权衡,以计算惠勒DFA上的匹配统计信息。此外,我们表明通过用线性数量的位扩充第 阶德布鲁因图的BOSS表示,我们可以在 的对数时间内遍历底层的可变阶德布鲁因图,从而改进了布歇等人之前的一个界限,该界限在 上是线性的[DCC 2015]。