• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.使用德布鲁因图搜索实现高维颜色信息的高效、可扩展且精确表示。
J Comput Biol. 2020 Apr;27(4):485-499. doi: 10.1089/cmb.2019.0322. Epub 2020 Mar 16.
2
Building large updatable colored de Bruijn graphs via merging.通过合并构建大型可更新彩色 de Bruijn 图。
Bioinformatics. 2019 Jul 15;35(14):i51-i60. doi: 10.1093/bioinformatics/btz350.
3
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
4
Simplitigs as an efficient and scalable representation of de Bruijn graphs.Simplitigs 作为一种高效且可扩展的 de Bruijn 图表示方法。
Genome Biol. 2021 Apr 6;22(1):96. doi: 10.1186/s13059-021-02297-z.
5
deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph.deGSM:大规模 de Bruijn 图的可扩展存储构建。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2157-2166. doi: 10.1109/TCBB.2019.2913932. Epub 2021 Dec 8.
6
Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。
Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.
7
Meta-colored compacted de Bruijn graphs.元彩色压缩德布鲁因图
bioRxiv. 2023 Nov 1:2023.07.21.550101. doi: 10.1101/2023.07.21.550101.
8
Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs.模式所在:带重复感知的彩色 de Bruijn 图压缩。
J Comput Biol. 2024 Oct;31(10):1022-1044. doi: 10.1089/cmb.2024.0714. Epub 2024 Oct 9.
9
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.用于构建大型双向 de Bruijn 图的高效并行和外核算法。
BMC Bioinformatics. 2010 Nov 15;11:560. doi: 10.1186/1471-2105-11-560.
10
Enhanced Compression of -Mer Sets with Counters via de Bruijn Graphs.通过 de Bruijn 图对 -Mer 集进行增强型压缩与计数器。
J Comput Biol. 2024 Jun;31(6):524-538. doi: 10.1089/cmb.2024.0530. Epub 2024 May 31.

引用本文的文献

1
Movi Color: fast and accurate long-read classification with the move structure.Movi Color:利用移动结构进行快速准确的长读长分类。
bioRxiv. 2025 May 27:2025.05.22.655637. doi: 10.1101/2025.05.22.655637.
2
Fast and Scalable Parallel External-Memory Construction of Colored Compacted de Bruijn Graphs with Cuttlefish 3.使用Cuttlefish 3进行彩色压缩德布鲁因图的快速且可扩展的并行外部内存构建
bioRxiv. 2025 Feb 6:2025.02.02.636161. doi: 10.1101/2025.02.02.636161.
3
Where the Patterns Are: Repetition-Aware Compression for Colored de Bruijn Graphs.模式所在:带重复感知的彩色 de Bruijn 图压缩。
J Comput Biol. 2024 Oct;31(10):1022-1044. doi: 10.1089/cmb.2024.0714. Epub 2024 Oct 9.
4
Compression algorithm for colored de Bruijn graphs.彩色德布鲁因图的压缩算法。
Algorithms Mol Biol. 2024 May 26;19(1):20. doi: 10.1186/s13015-024-00254-6.
5
Compression Algorithm for Colored de Bruijn Graphs.彩色德布鲁因图的压缩算法
Lebniz Int Proc Inform. 2023 Sep;273. doi: 10.4230/LIPIcs.WABI.2023.17. Epub 2023 Aug 29.
6
Fulgor: a fast and compact k-mer index for large-scale matching and color queries.Fulgor:一种用于大规模匹配和颜色查询的快速紧凑的k-mer索引。
Algorithms Mol Biol. 2024 Jan 22;19(1):3. doi: 10.1186/s13015-024-00251-9.
7
Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries.分层交错布隆过滤器:实现超快速、近似的序列查询。
Genome Biol. 2023 May 31;24(1):131. doi: 10.1186/s13059-023-02971-4.
8
Fulgor: A fast and compact -mer index for large-scale matching and color queries.富尔戈尔:一种用于大规模匹配和颜色查询的快速紧凑的k-mer索引。
bioRxiv. 2023 May 20:2023.05.09.539895. doi: 10.1101/2023.05.09.539895.
9
An incrementally updatable and scalable system for large-scale sequence search using the Bentley-Saxe transformation.一种使用本特利-萨克斯变换进行大规模序列搜索的增量可更新且可扩展的系统。
Bioinformatics. 2022 Jun 13;38(12):3155-3163. doi: 10.1093/bioinformatics/btac142.
10
VariantStore: an index for large-scale genomic variant search.变体存储:用于大规模基因组变体搜索的索引。
Genome Biol. 2021 Aug 19;22(1):231. doi: 10.1186/s13059-021-02442-8.

本文引用的文献

1
Ultrafast search of all deposited bacterial and viral genomic data.快速搜索所有已存入的细菌和病毒基因组数据。
Nat Biotechnol. 2019 Feb;37(2):152-159. doi: 10.1038/s41587-018-0010-1. Epub 2019 Feb 4.
2
SeqOthello: querying RNA-seq experiments at scale.SeqOthello:大规模查询 RNA-seq 实验。
Genome Biol. 2018 Oct 19;19(1):167. doi: 10.1186/s13059-018-1535-9.
3
Dynamic compression schemes for graph coloring.用于图着色的动态压缩方案。
Bioinformatics. 2019 Feb 1;35(3):407-414. doi: 10.1093/bioinformatics/bty632.
4
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
5
Practical dynamic de Bruijn graphs.实用动态 de Bruijn 图。
Bioinformatics. 2018 Dec 15;34(24):4189-4195. doi: 10.1093/bioinformatics/bty500.
6
Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.螳螂:一种快速、小巧、精确的大规模序列搜索索引。
Cell Syst. 2018 Aug 22;7(2):201-207.e4. doi: 10.1016/j.cels.2018.05.021. Epub 2018 Jun 20.
7
Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees.使用拆分序列布隆树改进对大型转录组测序数据库的搜索
J Comput Biol. 2018 Jul;25(7):755-765. doi: 10.1089/cmb.2017.0265. Epub 2018 Mar 12.
8
AllSome Sequence Bloom Trees.所有一些序列布隆树。
J Comput Biol. 2018 May;25(5):467-479. doi: 10.1089/cmb.2017.0258. Epub 2018 Apr 5.
9
Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。
Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.
10
Squeakr: an exact and approximate k-mer counting system.Squeakr:一种精确和近似的 k-mer 计数系统。
Bioinformatics. 2018 Feb 15;34(4):568-575. doi: 10.1093/bioinformatics/btx636.

使用德布鲁因图搜索实现高维颜色信息的高效、可扩展且精确表示。

An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

作者信息

Almodaresi Fatemeh, Pandey Prashant, Ferdman Michael, Johnson Rob, Patro Rob

机构信息

Department of Computer Science, University of Maryland, College Park, Maryland.

School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania.

出版信息

J Comput Biol. 2020 Apr;27(4):485-499. doi: 10.1089/cmb.2019.0322. Epub 2020 Mar 16.

DOI:10.1089/cmb.2019.0322
PMID:32176522
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7185321/
Abstract

The colored de Bruijn graph (cdbg) and its variants have become an important combinatorial structure used in numerous areas in genomics, such as population-level variation detection in metagenomic samples, large-scale sequence search, and cdbg-based reference sequence indices. As samples or genomes are added to the cdbg, the color information comes to dominate the space required to represent this data structure. In this article, we show how to represent the color information efficiently by adopting a hierarchical encoding that exploits correlations among color classes-patterns of color occurrence-present in the de Bruijn graph (dbg). A major challenge in deriving an efficient encoding of the color information that takes advantage of such correlations is determining which color classes are close to each other in the high-dimensional space of possible color patterns. We demonstrate that the dbg itself can be used as an efficient mechanism to search for approximate nearest neighbors in this space. While our approach reduces the encoding size of the color information even for relatively small cdbgs (hundreds of experiments), the gains are particularly consequential as the number of potential colors (i.e., samples or references) grows into thousands. We apply this encoding in the context of two different applications; the implicit cdbg used for a large-scale sequence search index, Mantis, as well as the encoding of color information used in population-level variation detection by tools such as Vari and Rainbowfish. Our results show significant improvements in the overall size and scalability of representation of the color information. In our experiment on 10,000 samples, we achieved >11 × better compression compared to Ramen, Ramen, Rao (RRR).

摘要

彩色德布鲁因图(cdbg)及其变体已成为基因组学众多领域中使用的一种重要组合结构,例如宏基因组样本中的群体水平变异检测、大规模序列搜索以及基于cdbg的参考序列索引。随着样本或基因组被添加到cdbg中,颜色信息开始主导表示此数据结构所需的空间。在本文中,我们展示了如何通过采用分层编码来高效表示颜色信息,该编码利用了德布鲁因图(dbg)中颜色类之间的相关性——颜色出现的模式。在推导利用此类相关性的颜色信息高效编码时,一个主要挑战是确定在可能的颜色模式的高维空间中哪些颜色类彼此接近。我们证明dbg本身可以用作在此空间中搜索近似最近邻的有效机制。虽然我们的方法即使对于相对较小的cdbg(数百个实验)也能减少颜色信息的编码大小,但随着潜在颜色的数量(即样本或参考)增长到数千个,收益尤为显著。我们在两种不同的应用场景中应用此编码;用于大规模序列搜索索引Mantis的隐式cdbg,以及Vari和Rainbowfish等工具在群体水平变异检测中使用的颜色信息编码。我们的结果表明,颜色信息表示的整体大小和可扩展性有显著改进。在我们对10000个样本的实验中,与Ramen、Rao(RRR)相比,我们实现了超过11倍的更好压缩。