Suppr超能文献

通过 de Bruijn 图对 -Mer 集进行增强型压缩与计数器。

Enhanced Compression of -Mer Sets with Counters via de Bruijn Graphs.

机构信息

Department of Information Engineering, University of Padua, Padua, Italy.

出版信息

J Comput Biol. 2024 Jun;31(6):524-538. doi: 10.1089/cmb.2024.0530. Epub 2024 May 31.

Abstract

An essential task in computational genomics involves transforming input sequences into their constituent -mers. The quest for an efficient representation of -mer sets is crucial for enhancing the scalability of bioinformatic analyses. One widely used method involves converting the -mer set into a de Bruijn graph (dBG), followed by seeking a compact graph representation via the smallest path cover. This study introduces USTAR* (Unitig STitch Advanced constRuction), a tool designed to compress both a set of -mers and their associated counts. USTAR leverages the connectivity and density of dBGs, enabling a more efficient path selection for constructing the path cover. The efficacy of USTAR is demonstrated through its application in compressing real read data sets. USTAR improves the compression achieved by UST (Unitig STitch), the best algorithm, by percentages ranging from 2.3% to 26.4%, depending on the -mer size, and it is up to times faster.

摘要

在计算基因组学中,将输入序列转换为其组成的 -mer 是一项基本任务。寻找有效的 -mer 集表示对于提高生物信息学分析的可扩展性至关重要。一种广泛使用的方法是将 -mer 集转换为 de Bruijn 图(dBG),然后通过寻找最小路径覆盖来寻求紧凑的图形表示。本研究介绍了 USTAR*(Unitig STitch Advanced constRuction),这是一种设计用于压缩 -mer 集及其相关计数的工具。USTAR 利用了 dBG 的连通性和密度,为构建路径覆盖提供了更有效的路径选择。USTAR 通过在压缩真实读取数据集方面的应用证明了其有效性。USTAR 提高了 UST(Unitig STitch)的压缩效果,最好的算法,百分比范围从 2.3%到 26.4%,具体取决于 -mer 的大小,并且速度快了 倍。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验