背包商数过滤器：一种用于查询具有丰度的k-mers的动态且节省空间的数据结构。

The backpack quotient filter: A dynamic and space-efficient data structure for querying -mers with abundance.

作者信息

Levallois Victor, Andreace Francesco, Le Gal Bertrand, Dufresne Yoann, Peterlongo Pierre

机构信息

University Rennes, Inria, CNRS, IRISA - UMR 6074, 35000 Rennes, France.

Department of Computational Biology, Institut Pasteur, Université Paris Cité, 75015 Paris, France.

出版信息

iScience. 2024 Nov 23;27(12):111435. doi: 10.1016/j.isci.2024.111435. eCollection 2024 Dec 20.

DOI:10.1016/j.isci.2024.111435

PMID:39720533

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11667073/

Abstract

Genomic data sequencing is crucial for understanding biological systems. As genomic databases like the European Nucleotide Archive expand exponentially, efficient data manipulation is essential. A key challenge is querying these databases to determine the presence or absence of specific sequences and their abundance within datasets. This paper presents the Backpack Quotient Filter (BQF), a data structure for indexing -mers (substrings of length ), which offers greater space efficiency than the Counting Quotient Filter (CQF). The BQF maintains essential features such as abundance information and dynamicity, with an extremely low false positive rate of less than . Our method redefines abundance information handling and implements an independent strategy for space efficiency. The BQF uses four times less space than the CQF on complex datasets such as sea-water metagenomics sequences. Additionally, its space efficiency improves with larger datasets, addressing the need for scalable data solutions.

摘要

基因组数据测序对于理解生物系统至关重要。随着诸如欧洲核苷酸档案库等基因组数据库呈指数级扩展，高效的数据处理至关重要。一个关键挑战是查询这些数据库，以确定数据集中特定序列的存在与否及其丰度。本文介绍了背包商数过滤器（BQF），一种用于索引k-mers（长度为k的子串）的数据结构，它比计数商数过滤器（CQF）具有更高的空间效率。BQF保留了诸如丰度信息和动态性等基本特征，误报率极低，小于[具体数值未给出]。我们的方法重新定义了丰度信息处理，并实现了一种独立的空间效率策略。在诸如海水宏基因组序列等复杂数据集上，BQF使用的空间比CQF少四倍。此外，随着数据集规模增大，其空间效率会提高，满足了对可扩展数据解决方案的需求。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

背包商数过滤器：一种用于查询具有丰度的k-mers的动态且节省空间的数据结构。

The backpack quotient filter: A dynamic and space-efficient data structure for querying -mers with abundance.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

背包商数过滤器：一种用于查询具有丰度的k-mers的动态且节省空间的数据结构。

The backpack quotient filter: A dynamic and space-efficient data structure for querying -mers with abundance.

作者信息

机构信息

出版信息

相似文献

本文引用的文献