Suppr超能文献

数据集自适应最小化器阶数降低了-mer计数中的内存使用量。

Data Set-Adaptive Minimizer Order Reduces Memory Usage in -Mer Counting.

作者信息

Flomin Dan, Pellow David, Shamir Ron

机构信息

Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.

出版信息

J Comput Biol. 2022 Aug;29(8):825-838. doi: 10.1089/cmb.2021.0599. Epub 2022 May 6.

Abstract

The rapid continuous growth of deep sequencing experiments requires development and improvement of many bioinformatic applications for analysis of large sequencing data sets, including -mer counting and assembly. Several applications reduce memory usage by binning sequences. Binning is done by using minimizer schemes, which rely on a specific order of the minimizers. It has been demonstrated that the choice of the order has a major impact on the performance of the applications. Here we introduce a method for tailoring the order to the data set. Our method repeatedly samples the data set and modifies the order so as to flatten the -mer load distribution across minimizers. We integrated our method into Gerbil, a state-of-the-art memory-efficient -mer counter, and were able to reduce its memory footprint by 30%-50% for large , with only a minor increase in runtime. Our tests also showed that the orders produced by our method produced superior results when transferred across data sets from the same species, with little or no order change. This enables memory reduction with essentially no increase in runtime.

摘要

深度测序实验的快速持续增长需要开发和改进许多用于分析大型测序数据集的生物信息学应用程序,包括k-mer计数和组装。有几种应用程序通过对序列进行分箱来减少内存使用。分箱是通过使用最小化器方案来完成的,这些方案依赖于最小化器的特定顺序。已经证明,顺序的选择对应用程序的性能有重大影响。在这里,我们介绍一种根据数据集定制顺序的方法。我们的方法反复对数据集进行采样并修改顺序,以便使k-mer负载分布在最小化器之间趋于平坦。我们将我们的方法集成到Gerbil中,这是一种最先进的内存高效k-mer计数器,对于大型k,我们能够将其内存占用减少30%-50%,而运行时仅略有增加。我们的测试还表明,我们的方法产生的顺序在跨同一物种的数据集转移时产生了更好的结果,顺序变化很小或没有变化。这使得在运行时基本不增加的情况下减少了内存。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验