Suppr超能文献

通过匹配统计进行后缀排序。

Suffix sorting via matching statistics.

作者信息

Lipták Zsuzsanna, Masillo Francesco, Puglisi Simon J

机构信息

Department of Computer Science, University of Verona, Verona, Italy.

Helsinki Institute for Information Technology (HIIT), Helsinki, Finland.

出版信息

Algorithms Mol Biol. 2024 Mar 12;19(1):11. doi: 10.1186/s13015-023-00245-z.

Abstract

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.

摘要

我们介绍了一种用于构建高度相似字符串集合的广义后缀数组的新算法。第一步,我们构建该集合相对于一个参考字符串的匹配统计信息的压缩表示。然后,我们使用此数据结构将后缀分配到一个偏序中,并随后加速后缀比较以完成广义后缀数组的构建。我们使用原型实现(我们称之为sacamats的工具)的实验证据表明,对于具有高度相似字符串的字符串集合,我们能够以与最快的现有方法相当或更快的时间构建后缀数组。在此过程中,我们描述了一种用于快速计算两个字符串匹配统计信息的启发式方法,该方法可能具有独立的研究价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/eca2c731bf3e/13015_2023_245_Figa_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验