• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过匹配统计进行后缀排序。

Suffix sorting via matching statistics.

作者信息

Lipták Zsuzsanna, Masillo Francesco, Puglisi Simon J

机构信息

Department of Computer Science, University of Verona, Verona, Italy.

Helsinki Institute for Information Technology (HIIT), Helsinki, Finland.

出版信息

Algorithms Mol Biol. 2024 Mar 12;19(1):11. doi: 10.1186/s13015-023-00245-z.

DOI:10.1186/s13015-023-00245-z
PMID:38475889
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10935992/
Abstract

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.

摘要

我们介绍了一种用于构建高度相似字符串集合的广义后缀数组的新算法。第一步,我们构建该集合相对于一个参考字符串的匹配统计信息的压缩表示。然后,我们使用此数据结构将后缀分配到一个偏序中,并随后加速后缀比较以完成广义后缀数组的构建。我们使用原型实现(我们称之为sacamats的工具)的实验证据表明,对于具有高度相似字符串的字符串集合,我们能够以与最快的现有方法相当或更快的时间构建后缀数组。在此过程中,我们描述了一种用于快速计算两个字符串匹配统计信息的启发式方法,该方法可能具有独立的研究价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/b0bf0a224f5d/13015_2023_245_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/eca2c731bf3e/13015_2023_245_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/8199369a432b/13015_2023_245_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/aed6392643cf/13015_2023_245_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/f58e1eab1948/13015_2023_245_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/e16008ef157d/13015_2023_245_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/ddd61c0a90a0/13015_2023_245_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/c7b98005048e/13015_2023_245_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/5092a5824e76/13015_2023_245_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/6255e966a695/13015_2023_245_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/3682240fb614/13015_2023_245_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/ca5a5fa37e0d/13015_2023_245_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/7d9fd7738b7d/13015_2023_245_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/3d1c3649ea2d/13015_2023_245_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/06512b6bdced/13015_2023_245_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/55aaf89266e0/13015_2023_245_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/6d9479c30f97/13015_2023_245_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/b0bf0a224f5d/13015_2023_245_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/eca2c731bf3e/13015_2023_245_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/8199369a432b/13015_2023_245_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/aed6392643cf/13015_2023_245_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/f58e1eab1948/13015_2023_245_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/e16008ef157d/13015_2023_245_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/ddd61c0a90a0/13015_2023_245_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/c7b98005048e/13015_2023_245_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/5092a5824e76/13015_2023_245_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/6255e966a695/13015_2023_245_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/3682240fb614/13015_2023_245_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/ca5a5fa37e0d/13015_2023_245_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/7d9fd7738b7d/13015_2023_245_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/3d1c3649ea2d/13015_2023_245_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/06512b6bdced/13015_2023_245_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/55aaf89266e0/13015_2023_245_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/6d9479c30f97/13015_2023_245_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2d/10935992/b0bf0a224f5d/13015_2023_245_Fig16_HTML.jpg

相似文献

1
Suffix sorting via matching statistics.通过匹配统计进行后缀排序。
Algorithms Mol Biol. 2024 Mar 12;19(1):11. doi: 10.1186/s13015-023-00245-z.
2
gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections.gsufsort:为字符串集合构建后缀数组、最长公共前缀数组和Burrows-Wheeler变换
Algorithms Mol Biol. 2020 Sep 22;15:18. doi: 10.1186/s13015-020-00177-y. eCollection 2020.
3
Generalized enhanced suffix array construction in external memory.外部内存中的广义增强后缀数组构造
Algorithms Mol Biol. 2017 Dec 7;12:26. doi: 10.1186/s13015-017-0117-9. eCollection 2017.
4
Breaking the -Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees.突破压缩后缀数组和后缀树构建中的-障碍
Proc Annu ACM SIAM Symp Discret Algorithms. 2023;2023:5122-5202. doi: 10.1137/1.9781611977554.ch187.
5
Using the Sadakane compressed suffix tree to solve the all-pairs suffix-prefix problem.使用笹钟根压缩后缀树解决所有后缀对前缀问题。
Biomed Res Int. 2014;2014:745298. doi: 10.1155/2014/745298. Epub 2014 Apr 16.
6
Computing matching statistics on Wheeler DFAs.计算惠勒确定有限自动机上的匹配统计信息。
Proc Data Compress Conf. 2023 Mar;2023:150-159. doi: 10.1109/dcc55655.2023.00023. Epub 2023 May 19.
7
An Elegant Algorithm for the Construction of Suffix Arrays.一种构建后缀数组的优雅算法。
J Discrete Algorithms (Amst). 2014 Jul 1;27:21-28. doi: 10.1016/j.jda.2014.03.001.
8
Relative Suffix Trees.
Comput J. 2018 May;61(5):773-788. doi: 10.1093/comjnl/bxx108. Epub 2017 Nov 21.
9
Indexing huge genome sequences for solving various problems.为解决各种问题对庞大的基因组序列进行索引。
Genome Inform. 2001;12:175-83.
10
Computing the original eBWT faster, simpler, and with less memory.更快、更简单且占用更少内存地计算原始增强型Burrows-Wheeler变换。
Int Symp String Process Inf Retr. 2021 Oct;12944:129-142. doi: 10.1007/978-3-030-86692-1_11. Epub 2021 Sep 27.

引用本文的文献

1
Marker discovery in the large.在大型研究中发现标志物。
Bioinform Adv. 2024 Jul 27;4(1):vbae113. doi: 10.1093/bioadv/vbae113. eCollection 2024.

本文引用的文献

1
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
2
Finding Maximal Exact Matches Using the r-Index.使用 r-索引查找最大精确匹配。
J Comput Biol. 2022 Feb;29(2):188-194. doi: 10.1089/cmb.2021.0445. Epub 2022 Jan 17.
3
Efficient Construction of a Complete Index for Pan-Genomics Read Alignment.高效构建全基因组读段比对的完整索引。
J Comput Biol. 2020 Apr;27(4):500-513. doi: 10.1089/cmb.2019.0309. Epub 2020 Mar 16.
4
Prefix-free parsing for building big BWTs.用于构建大型Burrows-Wheeler变换(BWT)的无前缀解析
Algorithms Mol Biol. 2019 May 24;14:13. doi: 10.1186/s13015-019-0148-5. eCollection 2019.
5
Variation graph toolkit improves read mapping by representing genetic variation in the reference.变异图谱工具包通过表示参考中的遗传变异来提高读映射质量。
Nat Biotechnol. 2018 Oct;36(9):875-879. doi: 10.1038/nbt.4227. Epub 2018 Aug 20.
6
Towards pan-genome read alignment to improve variation calling.迈向泛基因组读对齐以改善变异调用。
BMC Genomics. 2018 May 9;19(Suppl 2):87. doi: 10.1186/s12864-018-4465-8.
7
Computational pan-genomics: status, promises and challenges.计算泛基因组学:现状、前景与挑战。
Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089.
8
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.