Goga Adrián, Depuydt Lore, Brown Nathaniel K, Fostier Jan, Gagie Travis, Navarro Gonzalo
Dept. of Comp. Sci., Comenius University, Bratislava, Slovakia.
Dept. of Inf. Tech., Ghent University, Ghent, Belgium.
Proc Data Compress Conf. 2024 Mar;2024:123-132. doi: 10.1109/dcc58796.2024.00020. Epub 2024 May 21.
MONI (Rossi et al., 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the operations are constant-time LF-steps but most of the time is spent evaluating LCE queries. In this paper we show how (a variant of) the latter can be evaluated lazily, so as to bound the total time MONI needs to process the pattern in terms of the number of MEMs between the pattern and the text, while maintaining logarithmic latency.
MONI(罗西等人,2022年)是一种基于Burrows-Wheeler变换(BWT)的压缩索引,用于通过两种操作计算模式(通常是DNA读段)相对于高度重复文本(通常是基因组数据库)的匹配统计信息和最大精确匹配(MEM):对文本的语法压缩表示进行LF步长和最长公共扩展(LCE)查询。在实际应用中,大多数操作是常数时间的LF步长,但大部分时间都花在评估LCE查询上。在本文中,我们展示了如何对后者(的一个变体)进行惰性评估,以便根据模式与文本之间的MEM数量来限制MONI处理模式所需的总时间,同时保持对数延迟。