Suppr超能文献

惠勒确定有限自动机(DFA)的LCP数组的时空权衡

Space-time Trade-offs for the LCP Array of Wheeler DFAs.

作者信息

Cotumaccio Nicola, Gagie Travis, Köppl Dominik, Prezza Nicola

机构信息

GSSI, Italy.

Dalhousie University, Canada.

出版信息

Int Symp String Process Inf Retr. 2023 Sep;14240:143-156. doi: 10.1007/978-3-031-43980-3_12. Epub 2023 Sep 20.

Abstract

Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array requires bits, being the number of states, while the compact representation of Wheeler DFAs often requires much less space. In particular, the BOSS representation of a de Bruijn graph only requires a linear number of bits, if the size of alphabet is constant. In this paper, we propose a sampling technique that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits. We use our technique to provide a space-time tradeoff to compute matching statistics on a Wheeler DFA. In addition, we show that by augmenting the BOSS representation of a -th order de Bruijn graph with a linear number of bits we can navigate the underlying variable-order de Bruijn graph in time logarithmic in , thus improving a previous bound by Boucher et al. which was linear in [DCC 2015].

摘要

最近,孔特等人将最长公共前缀(LCP)数组从字符串推广到惠勒确定有限自动机(DFA),并且他们表明它可用于高效地确定惠勒DFA上的匹配统计信息[DCC 2023]。然而,存储LCP数组需要 位, 是状态数,而惠勒DFA的紧凑表示通常需要少得多的空间。特别地,如果字母表大小是常数,德布鲁因图的BOSS表示仅需要线性数量的位。在本文中,我们提出一种采样技术,该技术通过仅存储线性数量的位就能在对数时间内访问LCP数组的一个条目。我们使用我们的技术来提供一种时空权衡,以计算惠勒DFA上的匹配统计信息。此外,我们表明通过用线性数量的位扩充第 阶德布鲁因图的BOSS表示,我们可以在 的对数时间内遍历底层的可变阶德布鲁因图,从而改进了布歇等人之前的一个界限,该界限在 上是线性的[DCC 2015]。

相似文献

1
Space-time Trade-offs for the LCP Array of Wheeler DFAs.惠勒确定有限自动机(DFA)的LCP数组的时空权衡
Int Symp String Process Inf Retr. 2023 Sep;14240:143-156. doi: 10.1007/978-3-031-43980-3_12. Epub 2023 Sep 20.
2
Computing matching statistics on Wheeler DFAs.计算惠勒确定有限自动机上的匹配统计信息。
Proc Data Compress Conf. 2023 Mar;2023:150-159. doi: 10.1109/dcc55655.2023.00023. Epub 2023 May 19.
4
Buffering updates enables efficient dynamic de Bruijn graphs.缓冲更新可实现高效的动态德布鲁因图。
Comput Struct Biotechnol J. 2021 Jul 6;19:4067-4078. doi: 10.1016/j.csbj.2021.06.047. eCollection 2021.
6
On the Hardness of Sequence Alignment on De Bruijn Graphs.基于 De Bruijn 图的序列比对问题的难度研究
J Comput Biol. 2022 Dec;29(12):1377-1396. doi: 10.1089/cmb.2022.0411. Epub 2022 Nov 25.

本文引用的文献

1
Computing matching statistics on Wheeler DFAs.计算惠勒确定有限自动机上的匹配统计信息。
Proc Data Compress Conf. 2023 Mar;2023:150-159. doi: 10.1109/dcc55655.2023.00023. Epub 2023 May 19.
5
ABySS: a parallel assembler for short read sequence data.ABySS:一种用于短读长序列数据的并行汇编器。
Genome Res. 2009 Jun;19(6):1117-23. doi: 10.1101/gr.089532.108. Epub 2009 Feb 27.
6
An Eulerian path approach to DNA fragment assembly.一种用于DNA片段组装的欧拉路径方法。
Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53. doi: 10.1073/pnas.171285098.
7
A new algorithm for DNA sequence assembly.一种用于DNA序列组装的新算法。
J Comput Biol. 1995 Summer;2(2):291-306. doi: 10.1089/cmb.1995.2.291.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验