Shivakumar Vikram S, Langmead Ben
Department of Computer Science, Johns Hopkins University.
bioRxiv. 2025 May 25:2025.05.20.654611. doi: 10.1101/2025.05.20.654611.
Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We previously developed Mumemto, which computes maximal unique matches (multi-MUMs) across pangenomes using compressed indexing. In this work, we extend Mumemto by introducing two new partitioning and merging strategies. Both strategies enable highly parallel, memory efficient, and updateable computation of multi-MUMs. One of the strategies, called string-based merging, is also capable of conducting the merges in a way that follows the shape of a phylogenetic tree, naturally yielding the multi-MUM for the tree's internal nodes as well as the root. With these strategies, Mumemto now scales to 474 human haplotypes, the only multi-MUM method able to do so. It also introduces a time-memory tradeoff that allows Mumemto to be tailored to more scenarios, including in resource-limited settings.
泛基因组集合正在增长到数百个高质量基因组。这就需要可扩展的方法来构建能够纳入新测序组装体的泛基因组比对。我们之前开发了Mumemto,它使用压缩索引在泛基因组中计算最大唯一匹配(多MUMs)。在这项工作中,我们通过引入两种新的分区和合并策略来扩展Mumemto。这两种策略都能实现多MUMs的高度并行、内存高效且可更新的计算。其中一种策略称为基于字符串的合并,它还能够以遵循系统发育树形状的方式进行合并,自然地生成树内部节点以及根节点的多MUM。有了这些策略,Mumemto现在能够扩展到474个人类单倍型,是唯一能够做到这一点的多MUM方法。它还引入了时间 - 内存权衡,使Mumemto能够适应更多场景,包括资源有限的环境。