• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

惠勒图:基于Burrows-Wheeler变换的数据结构框架。

Wheeler graphs: A framework for BWT-based data structures.

作者信息

Gagie Travis, Manzini Giovanni, Sirén Jouni

机构信息

Diego Portales University and CEBIB, Santiago, Chile.

University of Eastern Piedmont, Alessandria, Italy.

出版信息

Theor Comput Sci. 2017 Oct 25;698:67-78. doi: 10.1016/j.tcs.2017.06.016.

DOI:10.1016/j.tcs.2017.06.016
PMID:29276331
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5727778/
Abstract

The famous Burrows-Wheeler Transform (BWT) was originally defined for a single string but variations have been developed for sets of strings, labeled trees, de Bruijn graphs, etc. In this paper we propose a framework that includes many of these variations and that we hope will simplify the search for more. We first define and show they have a property we call . We show that if the state diagram of a finite-state automaton is a Wheeler graph then, by its path coherence, we can order the nodes such that, for any string, the nodes reachable from the initial state or states by processing that string are consecutive. This means that even if the automaton is non-deterministic, we can still store it compactly and process strings with it quickly. We then rederive several variations of the BWT by designing straightforward finite-state automata for the relevant problems and showing that their state diagrams are Wheeler graphs.

摘要

著名的Burrows-Wheeler变换(BWT)最初是为单个字符串定义的,但已经针对字符串集、标记树、德布鲁因图等开发了变体。在本文中,我们提出了一个框架,其中包括许多这些变体,并且我们希望这将简化对更多变体的搜索。我们首先定义并展示它们具有我们称为 的属性。我们表明,如果有限状态自动机的状态图是惠勒图,那么通过其路径连贯性,我们可以对节点进行排序,使得对于任何字符串,通过处理该字符串从初始状态或多个初始状态可达的节点是连续的。这意味着即使自动机是非确定性的,我们仍然可以紧凑地存储它并快速用它处理字符串。然后,我们通过为相关问题设计直接的有限状态自动机并表明它们的状态图是惠勒图,重新推导了BWT的几个变体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/ffe111d516ca/gr011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/d067478b9a6c/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/8a57733acced/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/93ed860cb57d/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/a96056f31e84/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/5c5f747f4f6c/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/3bbfb02792d4/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/df21cdb0b3e0/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/556afdf78ecc/gr008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/89f04963e474/gr009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/e0465cd09455/gr010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/ffe111d516ca/gr011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/d067478b9a6c/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/8a57733acced/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/93ed860cb57d/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/a96056f31e84/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/5c5f747f4f6c/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/3bbfb02792d4/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/df21cdb0b3e0/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/556afdf78ecc/gr008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/89f04963e474/gr009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/e0465cd09455/gr010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95f9/5727778/ffe111d516ca/gr011.jpg

相似文献

1
Wheeler graphs: A framework for BWT-based data structures.惠勒图:基于Burrows-Wheeler变换的数据结构框架。
Theor Comput Sci. 2017 Oct 25;698:67-78. doi: 10.1016/j.tcs.2017.06.016.
2
Haplotype Matching with GBWT for Pangenome Graphs.用于泛基因组图的基于广义布隆游走树的单倍型匹配
bioRxiv. 2025 Feb 7:2025.02.03.634410. doi: 10.1101/2025.02.03.634410.
3
Computing matching statistics on Wheeler DFAs.计算惠勒确定有限自动机上的匹配统计信息。
Proc Data Compress Conf. 2023 Mar;2023:150-159. doi: 10.1109/dcc55655.2023.00023. Epub 2023 May 19.
4
deBWT: parallel construction of Burrows-Wheeler Transform for large collection of genomes with de Bruijn-branch encoding.deBWT:用于大量基因组集合的具有德布鲁因分支编码的Burrows-Wheeler变换的并行构建。
Bioinformatics. 2016 Jun 15;32(12):i174-i182. doi: 10.1093/bioinformatics/btw266.
5
Space-time Trade-offs for the LCP Array of Wheeler DFAs.惠勒确定有限自动机(DFA)的LCP数组的时空权衡
Int Symp String Process Inf Retr. 2023 Sep;14240:143-156. doi: 10.1007/978-3-031-43980-3_12. Epub 2023 Sep 20.
6
External memory BWT and LCP computation for sequence collections with applications.用于序列集合的外部内存BWT和LCP计算及其应用
Algorithms Mol Biol. 2019 Mar 8;14:6. doi: 10.1186/s13015-019-0140-0. eCollection 2019.
7
Indexing Graphs for Path Queries with Applications in Genome Research.用于路径查询的图索引及其在基因组研究中的应用
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):375-88. doi: 10.1109/TCBB.2013.2297101.
8
Variable-order reference-free variant discovery with the Burrows-Wheeler Transform.基于 Burrows-Wheeler 变换的变阶无参考变异发现。
BMC Bioinformatics. 2020 Sep 16;21(Suppl 8):260. doi: 10.1186/s12859-020-03586-3.
9
Heuristics for the run-length encoded Burrows-Wheeler transform alphabet ordering problem.游程编码的Burrows-Wheeler变换字母表排序问题的启发式算法。
J Heuristics. 2025;31(1):11. doi: 10.1007/s10732-025-09548-3. Epub 2025 Jan 28.
10
LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly.LSG:一种用于为下一代测序数据组装计算字符串图的外部存储工具。
J Comput Biol. 2016 Mar;23(3):137-49. doi: 10.1089/cmb.2015.0172.

引用本文的文献

1
Lossless Pangenome Indexing Using Tag Arrays.使用标签数组的无损全基因组索引
bioRxiv. 2025 May 15:2025.05.12.653561. doi: 10.1101/2025.05.12.653561.
2
GIN-TONIC: non-hierarchical full-text indexing for graph genomes.GIN-TONIC:用于图基因组的非分层全文索引
NAR Genom Bioinform. 2024 Dec 11;6(4):lqae159. doi: 10.1093/nargab/lqae159. eCollection 2024 Dec.
3
Space-time Trade-offs for the LCP Array of Wheeler DFAs.惠勒确定有限自动机(DFA)的LCP数组的时空权衡

本文引用的文献

1
A graph extension of the positional Burrows-Wheeler transform and its applications.位置布罗算法变换的图形扩展及其应用
Algorithms Mol Biol. 2017 Jul 11;12:18. doi: 10.1186/s13015-017-0109-9. eCollection 2017.
2
Indexing Graphs for Path Queries with Applications in Genome Research.用于路径查询的图索引及其在基因组研究中的应用
IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):375-88. doi: 10.1109/TCBB.2013.2297101.
3
On the representation of de Bruijn graphs.关于德布鲁因图的表示。
Int Symp String Process Inf Retr. 2023 Sep;14240:143-156. doi: 10.1007/978-3-031-43980-3_12. Epub 2023 Sep 20.
4
Computing matching statistics on Wheeler DFAs.计算惠勒确定有限自动机上的匹配统计信息。
Proc Data Compress Conf. 2023 Mar;2023:150-159. doi: 10.1109/dcc55655.2023.00023. Epub 2023 May 19.
5
Measuring, visualizing, and diagnosing reference bias with biastools.使用 biastools 测量、可视化和诊断参考偏倚。
Genome Biol. 2024 Apr 19;25(1):101. doi: 10.1186/s13059-024-03240-8.
6
Measuring, visualizing and diagnosing reference bias with biastools.使用偏差工具测量、可视化和诊断参考偏差。
bioRxiv. 2024 Feb 15:2023.09.13.557552. doi: 10.1101/2023.09.13.557552.
7
WGT: Tools and algorithms for recognizing, visualizing, and generating Wheeler graphs.WGT:用于识别、可视化和生成惠勒图的工具与算法。
iScience. 2023 Jul 14;26(8):107402. doi: 10.1016/j.isci.2023.107402. eCollection 2023 Aug 18.
8
Computational graph pangenomics: a tutorial on data structures and their applications.计算图泛基因组学:数据结构及其应用教程
Nat Comput. 2022 Mar;21(1):81-108. doi: 10.1007/s11047-022-09882-6. Epub 2022 Mar 4.
9
Population-scale genotyping of structural variation in the era of long-read sequencing.长读长测序时代结构变异的群体规模基因分型
Comput Struct Biotechnol J. 2022 May 27;20:2639-2647. doi: 10.1016/j.csbj.2022.05.047. eCollection 2022.
10
Bacterial genomic epidemiology with mixed samples.混合样本的细菌基因组流行病学研究。
Microb Genom. 2021 Nov;7(11). doi: 10.1099/mgen.0.000691.
J Comput Biol. 2015 May;22(5):336-52. doi: 10.1089/cmb.2014.0160. Epub 2015 Jan 28.
4
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.MEGAHIT:通过简洁的 de Bruijn 图实现的超快速单节点解决方案,适用于大型和复杂的宏基因组组装。
Bioinformatics. 2015 May 15;31(10):1674-6. doi: 10.1093/bioinformatics/btv033. Epub 2015 Jan 20.
5
Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT).利用位置 Burrows-Wheeler 变换 (PBWT) 实现高效单倍型匹配和存储。
Bioinformatics. 2014 May 1;30(9):1266-72. doi: 10.1093/bioinformatics/btu014. Epub 2014 Jan 9.