Department of Computer Science, University of Maryland, College Park, USA.
Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA.
Genome Biol. 2022 Sep 8;23(1):190. doi: 10.1186/s13059-022-02743-6.
The de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17-23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54-58 h, using considerably more memory.
de Bruijn 图是现代计算基因组学中的关键数据结构,其精简变体的构建位于许多基因组分析的上游。随着基因组数据量的快速增长,这通常会形成计算瓶颈。我们提出了 Cuttlefish 2,显著推进了这个问题的现有技术水平。在一台商用服务器上,它将 661K 个大小为 2.58Tbp 的细菌基因组的图构建时间从 4.5 天缩短到 17-23 小时;它大约在 10 小时内构建了 1.52Tbp 的白色云杉读取的图,而最接近的竞争对手则需要 54-58 小时,并且使用了更多的内存。