• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分层交错布隆过滤器:实现超快速、近似的序列查询。

Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries.

机构信息

Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.

MPI for Molecular Genetics, Ihnestr. 63, 14195, Berlin, Germany.

出版信息

Genome Biol. 2023 May 31;24(1):131. doi: 10.1186/s13059-023-02971-4.

DOI:10.1186/s13059-023-02971-4
PMID:37259161
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10230713/
Abstract

We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129.

摘要

我们提出了一种用于在大型数据库中搜索序列的新的数据结构

分层交错布隆过滤器(HIBF)。它速度极快,空间效率高,但非常通用,可以作为许多应用程序的基础引擎。我们表明,与其他最先进的工具相比,HIBF 在构建时间、索引大小和搜索时间方面具有优势,同时实现了相当或更好的准确性。HIBF 的构建索引速度快 211 倍,使用的空间少 14 倍,并且可以将近似成员查询的响应速度提高 129 倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/0107bc98d04d/13059_2023_2971_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/af3fa58bd31a/13059_2023_2971_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/e8df0f276ed9/13059_2023_2971_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/ce35ca4cf30a/13059_2023_2971_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/8f8d9eaf2411/13059_2023_2971_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/20834305be53/13059_2023_2971_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/ea93f63f7684/13059_2023_2971_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/720bb3b989d0/13059_2023_2971_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/6e4d71c036a5/13059_2023_2971_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/0107bc98d04d/13059_2023_2971_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/af3fa58bd31a/13059_2023_2971_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/e8df0f276ed9/13059_2023_2971_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/ce35ca4cf30a/13059_2023_2971_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/8f8d9eaf2411/13059_2023_2971_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/20834305be53/13059_2023_2971_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/ea93f63f7684/13059_2023_2971_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/720bb3b989d0/13059_2023_2971_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/6e4d71c036a5/13059_2023_2971_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c85/10230713/0107bc98d04d/13059_2023_2971_Fig9_HTML.jpg

相似文献

1
Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries.分层交错布隆过滤器:实现超快速、近似的序列查询。
Genome Biol. 2023 May 31;24(1):131. doi: 10.1186/s13059-023-02971-4.
2
BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters.BioBloom工具:使用布隆过滤器进行快速、准确且内存高效的宿主物种序列筛选。
Bioinformatics. 2014 Dec 1;30(23):3402-4. doi: 10.1093/bioinformatics/btu558. Epub 2014 Aug 20.
3
Mantis: A Fast, Small, and Exact Large-Scale Sequence-Search Index.螳螂:一种快速、小巧、精确的大规模序列搜索索引。
Cell Syst. 2018 Aug 22;7(2):201-207.e4. doi: 10.1016/j.cels.2018.05.021. Epub 2018 Jun 20.
4
fimpera: drastic improvement of Approximate Membership Query data-structures with counts.fimpera:使用计数极大地改进了近似成员查询数据结构。
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad305.
5
Improved representation of sequence bloom trees.序列 Bloom 树的表示方法改进。
Bioinformatics. 2020 Feb 1;36(3):721-727. doi: 10.1093/bioinformatics/btz662.
6
Squeakr: an exact and approximate k-mer counting system.Squeakr:一种精确和近似的 k-mer 计数系统。
Bioinformatics. 2018 Feb 15;34(4):568-575. doi: 10.1093/bioinformatics/btx636.
7
Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.快速在线和基于索引的算法,用于近似搜索 RNA 序列-结构模式。
BMC Bioinformatics. 2013 Jul 17;14:226. doi: 10.1186/1471-2105-14-226.
8
Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage.布隆过滤器前缀树:一种用于泛基因组存储的无比对和无参考的数据结构。
Algorithms Mol Biol. 2016 Apr 14;11:3. doi: 10.1186/s13015-016-0066-8. eCollection 2016.
9
MetaProFi: an ultrafast chunked Bloom filter for storing and querying protein and nucleotide sequence data for accurate identification of functionally relevant genetic variants.MetaProFi:一种超快的分块布隆过滤器,用于存储和查询蛋白质和核苷酸序列数据,以准确识别功能相关的遗传变异。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad101.
10
MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.MMseqs软件套件,用于对大型蛋白质序列集进行快速且深入的聚类和搜索。
Bioinformatics. 2016 May 1;32(9):1323-30. doi: 10.1093/bioinformatics/btw006. Epub 2016 Jan 6.

引用本文的文献

1
ganon2: up-to-date and scalable metagenomics analysis.Ganon2:最新且可扩展的宏基因组学分析。
NAR Genom Bioinform. 2025 Jul 17;7(3):lqaf094. doi: 10.1093/nargab/lqaf094. eCollection 2025 Sep.
2
Kaminari: a resource-frugal index for approximate colored -mer queries.电雷:一种用于近似彩色k-mer查询的资源节约型索引。
bioRxiv. 2025 May 21:2025.05.16.654317. doi: 10.1101/2025.05.16.654317.
3
TetRex: a novel algorithm for index-accelerated search of highly conserved motifs.TetRex:一种用于高度保守基序索引加速搜索的新算法。

本文引用的文献

1
The European Nucleotide Archive in 2021.2021 年的欧洲核苷酸档案库。
Nucleic Acids Res. 2022 Jan 7;50(D1):D106-D110. doi: 10.1093/nar/gkab1051.
2
Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences.猛禽:一种用于查询超大型核苷酸序列集合的快速且节省空间的预过滤器。
iScience. 2021 Jun 24;24(7):102782. doi: 10.1016/j.isci.2021.102782. eCollection 2021 Jul 23.
3
Sequence-specific minimizers via polar sets.通过极集实现序列特异性最小化。
NAR Genom Bioinform. 2025 Apr 17;7(2):lqaf039. doi: 10.1093/nargab/lqaf039. eCollection 2025 Jun.
4
Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters.基于分层交错异或过滤器的长读快速且节省空间的分类学分类。
Genome Res. 2024 Jul 23;34(6):914-924. doi: 10.1101/gr.278623.123.
5
Indexing and searching petabase-scale nucleotide resources.对 petabase 规模的核苷酸资源进行索引和搜索。
Nat Methods. 2024 Jun;21(6):994-1002. doi: 10.1038/s41592-024-02280-z. Epub 2024 May 16.
6
A comprehensive benchmark of graph-based genetic variant genotyping algorithms on plant genomes for creating an accurate ensemble pipeline.基于图的遗传变异基因分型算法在植物基因组上的综合基准测试,用于创建一个准确的综合管道。
Genome Biol. 2024 Apr 8;25(1):91. doi: 10.1186/s13059-024-03239-1.
7
Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA.使用 kmindex 和 ORA 在 TB 级别的复杂基因组数据集上进行索引和实时用户友好查询。
Nat Comput Sci. 2024 Feb;4(2):104-109. doi: 10.1038/s43588-024-00596-6. Epub 2024 Feb 26.
Bioinformatics. 2021 Jul 12;37(Suppl_1):i187-i195. doi: 10.1093/bioinformatics/btab313.
4
Syncmers are more sensitive than minimizers for selecting conserved ‑mers in biological sequences.同步寡聚体在选择生物序列中的保守寡聚体方面比最小寡聚体更敏感。
PeerJ. 2021 Feb 5;9:e10805. doi: 10.7717/peerj.10805. eCollection 2021.
5
Data structures based on -mers for querying large collections of sequencing data sets.基于 - 元的序列数据集查询的大型数据集的数据结构。
Genome Res. 2021 Jan;31(1):1-12. doi: 10.1101/gr.260604.119. Epub 2020 Dec 16.
6
Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.Bifrost:彩色紧凑布隆图的高度并行构建和索引
Genome Biol. 2020 Sep 17;21(1):249. doi: 10.1186/s13059-020-02135-8.
7
ganon: precise metagenomics classification against large and up-to-date sets of reference sequences.ganon:针对大型且最新的参考序列集进行精确的宏基因组分类。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i12-i20. doi: 10.1093/bioinformatics/btaa458.
8
An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.使用德布鲁因图搜索实现高维颜色信息的高效、可扩展且精确表示。
J Comput Biol. 2020 Apr;27(4):485-499. doi: 10.1089/cmb.2019.0322. Epub 2020 Mar 16.
9
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
10
Improved representation of sequence bloom trees.序列 Bloom 树的表示方法改进。
Bioinformatics. 2020 Feb 1;36(3):721-727. doi: 10.1093/bioinformatics/btz662.