Suppr超能文献

MQF 和缓冲 MQF:用于高效存储具有计数和元数据的 k-mer 的商滤波器。

MQF and buffered MQF: quotient filters for efficient storage of k-mers with their counts and metadata.

机构信息

Department of Computer Science, University of California, Davis, CA, USA.

Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA, USA.

出版信息

BMC Bioinformatics. 2021 Feb 16;22(1):71. doi: 10.1186/s12859-021-03996-x.

Abstract

BACKGROUND

Specialized data structures are required for online algorithms to efficiently handle large sequencing datasets. The counting quotient filter (CQF), a compact hashtable, can efficiently store k-mers with a skewed distribution.

RESULT

Here, we present the mixed-counters quotient filter (MQF) as a new variant of the CQF with novel counting and labeling systems. The new counting system adapts to a wider range of data distributions for increased space efficiency and is faster than the CQF for insertions and queries in most of the tested scenarios. A buffered version of the MQF can offload storage to disk, trading speed of insertions and queries for a significant memory reduction. The labeling system provides a flexible framework for assigning labels to member items while maintaining good data locality and a concise memory representation. These labels serve as a minimal perfect hash function but are ~ tenfold faster than BBhash, with no need to re-analyze the original data for further insertions or deletions.

CONCLUSIONS

The MQF is a flexible and efficient data structure that extends our ability to work with high throughput sequencing data.

摘要

背景

在线算法需要专门的数据结构才能有效地处理大型测序数据集。计数商滤波器 (CQF) 是一种紧凑的哈希表,可以有效地存储分布不均匀的 k-mer。

结果

在这里,我们提出了混合计数器商滤波器 (MQF),作为 CQF 的一种新变体,具有新的计数和标记系统。新的计数系统适应更广泛的数据分布,以提高空间效率,并且在大多数测试场景中,插入和查询的速度比 CQF 快。MQF 的缓冲版本可以将存储卸载到磁盘上,以显著减少内存为代价换取插入和查询的速度。标记系统为成员项分配标签提供了一个灵活的框架,同时保持了良好的数据局部性和简洁的内存表示。这些标签用作最小完美哈希函数,但比 BBhash 快约十倍,无需为进一步的插入或删除重新分析原始数据。

结论

MQF 是一种灵活高效的数据结构,扩展了我们处理高通量测序数据的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc9a/7885209/4728051de36c/12859_2021_3996_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验