Suppr超能文献

利用最小去环集对大 值 进行有效最小化排序。

Efficient minimizer orders for large values of using minimum decycling sets.

机构信息

Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 6997801, Israel.

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.

出版信息

Genome Res. 2023 Jul;33(7):1154-1161. doi: 10.1101/gr.277644.123. Epub 2023 Aug 9.

Abstract

Minimizers are ubiquitously used in data structures and algorithms for efficient searching, mapping, and indexing of high-throughput DNA sequencing data. Minimizer schemes select a minimum -mer in every -long subsequence of the target sequence, where minimality is with respect to a predefined -mer order. Commonly used minimizer orders select more -mers than necessary and therefore provide limited improvement in runtime and memory usage of downstream analysis tasks. The recently introduced universal -mer hitting sets produce minimizer orders with fewer selected -mers. Generating compact universal -mer hitting sets is currently infeasible for > 13, and thus, they cannot help in the many applications that require minimizer orders for larger Here, we close the gap of efficient minimizer orders for large values of by introducing --: new minimizer orders based on minimum decycling sets. We show that in practice these new minimizer orders select a number of -mers comparable to that of minimizer orders based on universal -mer hitting sets and can also scale to a larger Furthermore, we developed a method that computes the minimizers in a sequence on the fly without keeping the -mers of a decycling set in memory. This enables the use of these minimizer orders for any value of We expect the new orders to improve the runtime and memory usage of algorithms and data structures in high-throughput DNA sequencing analysis.

摘要

最小生成器在数据结构和算法中被广泛用于高效搜索、映射和索引高通量 DNA 测序数据。最小生成器方案在目标序列的每个 - 长子序列中选择一个最小 -mer,其中最小性是相对于预定义的 -mer 顺序。常用的最小生成器顺序选择的 -mers 比必要的多,因此在运行时和下游分析任务的内存使用方面提供的改进有限。最近引入的通用 -mer 命中集生成具有较少选定 -mers 的最小生成器顺序。对于 > 13,生成紧凑的通用 -mer 命中集目前是不可行的,因此它们无法帮助许多需要较大 的应用程序生成最小生成器顺序。在这里,我们通过引入 --:基于最小非循环集的新最小生成器顺序来缩小高效最小生成器顺序的差距。我们表明,在实践中,这些新的最小生成器顺序选择的 -mers 数量与基于通用 -mer 命中集的最小生成器顺序相当,并且也可以扩展到更大的 。此外,我们开发了一种在不将非循环集的 -mers 保留在内存中的情况下在序列中动态计算最小生成器的方法。这使得这些最小生成器顺序可以用于任何 的值。我们期望新的顺序能够提高高通量 DNA 测序分析中算法和数据结构的运行时和内存使用效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8feb/10538483/63cc2822acff/1154f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验