• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过延迟最长公共前缀(LCP)评估实现更快的最大精确匹配

Faster Maximal Exact Matches with Lazy LCP Evaluation.

作者信息

Goga Adrián, Depuydt Lore, Brown Nathaniel K, Fostier Jan, Gagie Travis, Navarro Gonzalo

机构信息

Dept. of Comp. Sci., Comenius University, Bratislava, Slovakia.

Dept. of Inf. Tech., Ghent University, Ghent, Belgium.

出版信息

Proc Data Compress Conf. 2024 Mar;2024:123-132. doi: 10.1109/dcc58796.2024.00020. Epub 2024 May 21.

DOI:10.1109/dcc58796.2024.00020
PMID:39157794
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11328106/
Abstract

MONI (Rossi et al., 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the operations are constant-time LF-steps but most of the time is spent evaluating LCE queries. In this paper we show how (a variant of) the latter can be evaluated lazily, so as to bound the total time MONI needs to process the pattern in terms of the number of MEMs between the pattern and the text, while maintaining logarithmic latency.

摘要

MONI(罗西等人,2022年)是一种基于Burrows-Wheeler变换(BWT)的压缩索引,用于通过两种操作计算模式(通常是DNA读段)相对于高度重复文本(通常是基因组数据库)的匹配统计信息和最大精确匹配(MEM):对文本的语法压缩表示进行LF步长和最长公共扩展(LCE)查询。在实际应用中,大多数操作是常数时间的LF步长,但大部分时间都花在评估LCE查询上。在本文中,我们展示了如何对后者(的一个变体)进行惰性评估,以便根据模式与文本之间的MEM数量来限制MONI处理模式所需的总时间,同时保持对数延迟。

相似文献

1
Faster Maximal Exact Matches with Lazy LCP Evaluation.通过延迟最长公共前缀(LCP)评估实现更快的最大精确匹配
Proc Data Compress Conf. 2024 Mar;2024:123-132. doi: 10.1109/dcc58796.2024.00020. Epub 2024 May 21.
2
Augmented Thresholds for MONI.MONI的增强阈值。
Proc Data Compress Conf. 2023 Mar;2023:268-277. doi: 10.1109/dcc55655.2023.00035. Epub 2023 May 19.
3
MONI: A Pangenomic Index for Finding Maximal Exact Matches.MONI:用于寻找最大精确匹配的泛基因组索引。
J Comput Biol. 2022 Feb;29(2):169-187. doi: 10.1089/cmb.2021.0290. Epub 2022 Jan 17.
4
PFP Compressed Suffix Trees.PFP压缩后缀树
Proc Worksh Algorithm Eng Exp. 2021;2021:60-72. doi: 10.1137/1.9781611976472.5.
5
Finding Maximal Exact Matches Using the r-Index.使用 r-索引查找最大精确匹配。
J Comput Biol. 2022 Feb;29(2):188-194. doi: 10.1089/cmb.2021.0445. Epub 2022 Jan 17.
6
How to Find Long Maximal Exact Matches and Ignore Short Ones.如何找到长的最大精确匹配并忽略短的匹配。
Dev Lang Theory. 2024 Aug;14791:131-140. doi: 10.1007/978-3-031-66159-4_10. Epub 2024 Jul 27.
7
PHONI: Streamed Matching Statistics with Multi-Genome References.PHONI:多基因组参考的流式匹配统计
Proc Data Compress Conf. 2021 Mar;2021:193-202. doi: 10.1109/dcc50243.2021.00027. Epub 2021 May 10.
8
MEM-based pangenome indexing for -mer queries.基于MEM的用于k-mer查询的泛基因组索引
bioRxiv. 2024 May 22:2024.05.20.595044. doi: 10.1101/2024.05.20.595044.
9
Finding maximal exact matches in graphs.在图中寻找最大精确匹配。
Algorithms Mol Biol. 2024 Mar 11;19(1):10. doi: 10.1186/s13015-024-00255-5.
10
Taxonomic classification with maximal exact matches in KATKA kernels and minimizer digests.在KATKA内核和最小化器摘要中具有最大精确匹配的分类学分类。
Lebniz Int Proc Inform. 2024 Jul;301. doi: 10.4230/LIPIcs.SEA.2024.10. Epub 2024 Jul 11.

引用本文的文献

1
b-move: faster lossless approximate pattern matching in a run-length compressed index.b移动:在游程长度压缩索引中实现更快的无损近似模式匹配。
Algorithms Mol Biol. 2025 Aug 12;20(1):15. doi: 10.1186/s13015-025-00281-x.
2
Run-length compressed metagenomic read classification with SMEM-finding and tagging.基于SMEM查找和标记的游程长度压缩宏基因组读取分类
bioRxiv. 2025 Mar 24:2025.02.25.640119. doi: 10.1101/2025.02.25.640119.
3
b-move: Faster Lossless Approximate Pattern Matching in a Run-Length Compressed Index.b移动:游程长度压缩索引中的更快无损近似模式匹配
Res Sq. 2024 Nov 18:rs.3.rs-5367343. doi: 10.21203/rs.3.rs-5367343/v1.
4
How to Find Long Maximal Exact Matches and Ignore Short Ones.如何找到长的最大精确匹配并忽略短的匹配。
Dev Lang Theory. 2024 Aug;14791:131-140. doi: 10.1007/978-3-031-66159-4_10. Epub 2024 Jul 27.
5
b-move: faster bidirectional character extensions in a run-length compressed index.b移动:游程长度压缩索引中更快的双向字符扩展
bioRxiv. 2024 Jun 2:2024.05.30.596587. doi: 10.1101/2024.05.30.596587.

本文引用的文献

1
Sigmoni: classification of nanopore signal with a compressed pangenome index.西格蒙尼:使用压缩泛基因组索引对纳米孔信号进行分类。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i287-i296. doi: 10.1093/bioinformatics/btae213.
2
Augmented Thresholds for MONI.MONI的增强阈值。
Proc Data Compress Conf. 2023 Mar;2023:268-277. doi: 10.1109/dcc55655.2023.00035. Epub 2023 May 19.
3
SPUMONI 2: improved classification using a pangenome index of minimizer digests.SPUMONI 2:使用最小化消化物的泛基因组指数进行改进分类。
Genome Biol. 2023 May 18;24(1):122. doi: 10.1186/s13059-023-02958-1.
4
MONI: A Pangenomic Index for Finding Maximal Exact Matches.MONI:用于寻找最大精确匹配的泛基因组索引。
J Comput Biol. 2022 Feb;29(2):169-187. doi: 10.1089/cmb.2021.0290. Epub 2022 Jan 17.
5
Pan-genomic matching statistics for targeted nanopore sequencing.靶向纳米孔测序的泛基因组匹配统计
iScience. 2021 Jun 8;24(6):102696. doi: 10.1016/j.isci.2021.102696. eCollection 2021 Jun 25.
6
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
7
Storage and retrieval of highly repetitive sequence collections.高度重复序列集合的存储与检索。
J Comput Biol. 2010 Mar;17(3):281-308. doi: 10.1089/cmb.2009.0169.
8
Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。
Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.
9
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.