Suppr超能文献

快速有向无环图压缩在分布式内存环境中。

Fast de Bruijn Graph Compaction in Distributed Memory Environments.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):136-148. doi: 10.1109/TCBB.2018.2858797. Epub 2018 Jul 31.

Abstract

De Bruijn graph based genome assembly has gained popularity as short read sequencers become ubiquitous. A core assembly operation is the generation of unitigs, which are sequences corresponding to chains in the graph. Unitigs are used as building blocks for generating longer sequences in many assemblers, and can facilitate graph compression. Chain compaction, by which unitigs are generated, remains a critical computational task. In this paper, we present a distributed memory parallel algorithm for simultaneous compaction of all chains in bi-directed de Bruijn graphs. The key advantages of our algorithm include bounding the chain compaction run-time to logarithmic number of iterations in the length of the longest chain, and ability to differentiate cycles from chains within logarithmic number of iterations in the length of the longest cycle. Our algorithm scales to thousands of computational cores, and can compact a whole genome de Bruijn graph from a human sequence read set in 7.3 seconds using 7680 distributed memory cores, and in 12.9 minutes using 64 shared memory cores. It is 3.7× and 2.0× faster than equivalent steps in the state-of-the-art tools for distributed and shared memory environments, respectively. An implementation of the algorithm is available at https://github.com/ParBLiSS/bruno.

摘要

基于 De Bruijn 图的基因组组装在短读测序变得普及后得到了广泛应用。核心组装操作是生成单元(unitig),单元是对应于图中链的序列。在许多组装器中,单元被用作生成更长序列的构建块,并且可以促进图压缩。生成单元的链压缩仍然是一个关键的计算任务。在本文中,我们提出了一种用于双向 De Bruijn 图中所有链同时压缩的分布式内存并行算法。我们算法的主要优点包括将链压缩的运行时间限制为最长链长度的对数迭代次数,并且能够在最长循环长度的对数迭代次数内区分循环和链。我们的算法可扩展到数千个计算核心,使用 7680 个分布式内存核心可以在 7.3 秒内压缩人类序列读取集的整个基因组 De Bruijn 图,使用 64 个共享内存核心可以在 12.9 分钟内压缩。它比分布式和共享内存环境中最先进工具的等效步骤分别快 3.7 倍和 2.0 倍。该算法的实现可在 https://github.com/ParBLiSS/bruno 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验