Suppr超能文献

用于可扩展全基因组学的分区多MUM查找

Partitioned Multi-MUM finding for scalable pangenomics.

作者信息

Shivakumar Vikram S, Langmead Ben

机构信息

Department of Computer Science, Johns Hopkins University.

出版信息

bioRxiv. 2025 May 25:2025.05.20.654611. doi: 10.1101/2025.05.20.654611.

Abstract

Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We previously developed Mumemto, which computes maximal unique matches (multi-MUMs) across pangenomes using compressed indexing. In this work, we extend Mumemto by introducing two new partitioning and merging strategies. Both strategies enable highly parallel, memory efficient, and updateable computation of multi-MUMs. One of the strategies, called string-based merging, is also capable of conducting the merges in a way that follows the shape of a phylogenetic tree, naturally yielding the multi-MUM for the tree's internal nodes as well as the root. With these strategies, Mumemto now scales to 474 human haplotypes, the only multi-MUM method able to do so. It also introduces a time-memory tradeoff that allows Mumemto to be tailored to more scenarios, including in resource-limited settings.

摘要

泛基因组集合正在增长到数百个高质量基因组。这就需要可扩展的方法来构建能够纳入新测序组装体的泛基因组比对。我们之前开发了Mumemto,它使用压缩索引在泛基因组中计算最大唯一匹配(多MUMs)。在这项工作中,我们通过引入两种新的分区和合并策略来扩展Mumemto。这两种策略都能实现多MUMs的高度并行、内存高效且可更新的计算。其中一种策略称为基于字符串的合并,它还能够以遵循系统发育树形状的方式进行合并,自然地生成树内部节点以及根节点的多MUM。有了这些策略,Mumemto现在能够扩展到474个人类单倍型,是唯一能够做到这一点的多MUM方法。它还引入了时间 - 内存权衡,使Mumemto能够适应更多场景,包括资源有限的环境。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/299a/12139944/96928c7e77da/nihpp-2025.05.20.654611v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验