• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
PHONI: Streamed Matching Statistics with Multi-Genome References.PHONI:多基因组参考的流式匹配统计
Proc Data Compress Conf. 2021 Mar;2021:193-202. doi: 10.1109/dcc50243.2021.00027. Epub 2021 May 10.
2
Fast and compact matching statistics analytics.快速且紧凑的匹配统计分析。
Bioinformatics. 2022 Mar 28;38(7):1838-1845. doi: 10.1093/bioinformatics/btac064.
3
Faster Maximal Exact Matches with Lazy LCP Evaluation.通过延迟最长公共前缀(LCP)评估实现更快的最大精确匹配
Proc Data Compress Conf. 2024 Mar;2024:123-132. doi: 10.1109/dcc58796.2024.00020. Epub 2024 May 21.
4
b-move: faster bidirectional character extensions in a run-length compressed index.b移动:游程长度压缩索引中更快的双向字符扩展
bioRxiv. 2024 Jun 2:2024.05.30.596587. doi: 10.1101/2024.05.30.596587.
5
Computing the original eBWT faster, simpler, and with less memory.更快、更简单且占用更少内存地计算原始增强型Burrows-Wheeler变换。
Int Symp String Process Inf Retr. 2021 Oct;12944:129-142. doi: 10.1007/978-3-030-86692-1_11. Epub 2021 Sep 27.
6
FMtree: a fast locating algorithm of FM-indexes for genomic data.FMtree:一种用于基因组数据的 FM-indexes 的快速定位算法。
Bioinformatics. 2018 Feb 1;34(3):416-424. doi: 10.1093/bioinformatics/btx596.
7
Computing matching statistics on Wheeler DFAs.计算惠勒确定有限自动机上的匹配统计信息。
Proc Data Compress Conf. 2023 Mar;2023:150-159. doi: 10.1109/dcc55655.2023.00023. Epub 2023 May 19.
8
A fast and memory-efficient implementation of the transfer bootstrap.转移.bootstrap 的快速且节省内存的实现。
Bioinformatics. 2020 Apr 1;36(7):2280-2281. doi: 10.1093/bioinformatics/btz874.
9
SOPanG: online text searching over a pan-genome.SOPanG:泛基因组上的在线文本搜索。
Bioinformatics. 2018 Dec 15;34(24):4290-4292. doi: 10.1093/bioinformatics/bty506.
10
pyBedGraph: a python package for fast operations on 1D genomic signal tracks.pyBedGraph:一个用于快速操作一维基因组信号轨迹的 Python 包。
Bioinformatics. 2020 May 1;36(10):3234-3235. doi: 10.1093/bioinformatics/btaa061.

引用本文的文献

1
b-move: faster lossless approximate pattern matching in a run-length compressed index.b移动:在游程长度压缩索引中实现更快的无损近似模式匹配。
Algorithms Mol Biol. 2025 Aug 12;20(1):15. doi: 10.1186/s13015-025-00281-x.
2
Generating multiple alignments on a pangenomic scale.在泛基因组规模上生成多个比对。
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf104.
3
Run-length compressed metagenomic read classification with SMEM-finding and tagging.基于SMEM查找和标记的游程长度压缩宏基因组读取分类
bioRxiv. 2025 Mar 24:2025.02.25.640119. doi: 10.1101/2025.02.25.640119.
4
b-move: Faster Lossless Approximate Pattern Matching in a Run-Length Compressed Index.b移动:游程长度压缩索引中的更快无损近似模式匹配
Res Sq. 2024 Nov 18:rs.3.rs-5367343. doi: 10.21203/rs.3.rs-5367343/v1.
5
Sigmoni: classification of nanopore signal with a compressed pangenome index.西格蒙尼:使用压缩泛基因组索引对纳米孔信号进行分类。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i287-i296. doi: 10.1093/bioinformatics/btae213.
6
b-move: faster bidirectional character extensions in a run-length compressed index.b移动:游程长度压缩索引中更快的双向字符扩展
bioRxiv. 2024 Jun 2:2024.05.30.596587. doi: 10.1101/2024.05.30.596587.
7
An Upper Bound and Linear-Space Queries on the LZ-End Parsing.关于LZ-末端解析的一个上界和线性空间查询
Proc Annu ACM SIAM Symp Discret Algorithms. 2022;2022:2847-2866. doi: 10.1137/1.9781611977073.111.
8
Augmented Thresholds for MONI.MONI的增强阈值。
Proc Data Compress Conf. 2023 Mar;2023:268-277. doi: 10.1109/dcc55655.2023.00035. Epub 2023 May 19.
9
Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing.基于LZ77解析的AVL语法的快速且空间高效构建。
Lebniz Int Proc Inform. 2021;204. doi: 10.4230/LIPIcs.ESA.2021.56.
10
Sigmoni: classification of nanopore signal with a compressed pangenome index.西格莫尼:使用压缩全基因组索引对纳米孔信号进行分类。
bioRxiv. 2023 Aug 30:2023.08.15.553308. doi: 10.1101/2023.08.15.553308.

本文引用的文献

1
MONI: A Pangenomic Index for Finding Maximal Exact Matches.MONI:用于寻找最大精确匹配的泛基因组索引。
J Comput Biol. 2022 Feb;29(2):169-187. doi: 10.1089/cmb.2021.0290. Epub 2022 Jan 17.
2
Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED.利用 UNCALLED 对原始电信号进行实时映射的靶向纳米孔测序。
Nat Biotechnol. 2021 Apr;39(4):431-441. doi: 10.1038/s41587-020-0731-9. Epub 2020 Nov 30.
3
Rapid identification of pathogens, antibiotic resistance genes and plasmids in blood cultures by nanopore sequencing.纳米孔测序快速鉴定血培养中的病原体、抗生素耐药基因和质粒。
Sci Rep. 2020 May 6;10(1):7622. doi: 10.1038/s41598-020-64616-x.
4
Portable nanopore analytics: are we there yet?便携式纳米孔分析:我们做到了吗?
Bioinformatics. 2020 Aug 15;36(16):4399-4405. doi: 10.1093/bioinformatics/btaa237.
5
Efficient Construction of a Complete Index for Pan-Genomics Read Alignment.高效构建全基因组读段比对的完整索引。
J Comput Biol. 2020 Apr;27(4):500-513. doi: 10.1089/cmb.2019.0309. Epub 2020 Mar 16.
6
Real-Time Selective Sequencing with RUBRIC: Read Until with Basecall and Reference-Informed Criteria.实时选择性测序与 RUBRIC:基于碱基调用和参考信息的读取准则。
Sci Rep. 2019 Aug 7;9(1):11475. doi: 10.1038/s41598-019-47857-3.
7
Prefix-free parsing for building big BWTs.用于构建大型Burrows-Wheeler变换(BWT)的无前缀解析
Algorithms Mol Biol. 2019 May 24;14:13. doi: 10.1186/s13015-019-0148-5. eCollection 2019.

PHONI:多基因组参考的流式匹配统计

PHONI: Streamed Matching Statistics with Multi-Genome References.

作者信息

Boucher Christina, Gagie Travis, Tomohiro I, Köppl Dominik, Langmead Ben, Manzini Giovanni, Navarro Gonzalo, Pacheco Alejandro, Rossi Massimiliano

机构信息

U Florida Gainesville, USA.

Dalhousie U Halifax, Canada.

出版信息

Proc Data Compress Conf. 2021 Mar;2021:193-202. doi: 10.1109/dcc50243.2021.00027. Epub 2021 May 10.

DOI:10.1109/dcc50243.2021.00027
PMID:34778549
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8583545/
Abstract

Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database. Our code is available at https://github.com/koeppl/phoni.

摘要

计算模式相对于文本的匹配统计信息是生物信息学中的一项基本任务,但当文本是高度压缩的基因组数据库时,这是一项艰巨的任务。Bannai等人针对这种情况给出了一个有效的解决方案,Rossi等人最近对其进行了实现,但它对模式进行了两遍处理,并且在第一遍处理期间为每个字符缓冲一个指针。在本文中,我们简化了他们的解决方案并使其成为流处理方式,代价是稍微降低了处理速度。这意味着,首先,我们可以并行计算多个长模式(例如整个人类染色体)的匹配统计信息,同时仍然使用合理数量的随机存取存储器(RAM);其次,我们可以在线以低延迟计算匹配统计信息,从而快速识别出相对于数据库而言模式何时变得不可压缩。我们的代码可在https://github.com/koeppl/phoni上获取。