Shivakumar Vikram S, Langmead Ben
Department of Computer Science, Johns Hopkins University.
bioRxiv. 2025 Jan 5:2025.01.05.631388. doi: 10.1101/2025.01.05.631388.
Aligning genomes into common coordinates is central to pangenome analysis and construction, but it is also computationally expensive. Multi-sequence maximal unique matches (multi-MUMs) are guideposts for core genome alignments, helping to frame and solve the multiple alignment problem. We introduce Mumemto, a tool that computes multi-MUMs and other match types across large pangenomes. Mumemto allows for visualization of synteny, reveals aberrant assemblies and scaffolds, and highlights pangenome conservation and structural variation. Mumemto computes multi-MUMs across 320 human genome assemblies (960GB) in 25.7 hours with under 800 GB of memory, and over hundreds of fungal genome assemblies in minutes. Mumemto is implemented in C++ and Python and available open-source at https://github.com/vikshiv/mumemto.
将基因组比对到共同的坐标是泛基因组分析和构建的核心,但计算成本也很高。多序列最大唯一匹配(multi-MUMs)是核心基因组比对的路标,有助于构建和解决多重比对问题。我们引入了Mumemto,这是一种可在大型泛基因组中计算multi-MUMs和其他匹配类型的工具。Mumemto允许对共线性进行可视化,揭示异常组装和支架,并突出泛基因组的保守性和结构变异。Mumemto在25.7小时内使用不到800GB的内存就能在320个人类基因组组装(960GB)中计算multi-MUMs,并且能在几分钟内完成数百个真菌基因组组装的计算。Mumemto用C++和Python实现,可在https://github.com/vikshiv/mumemto上开源获取。