Mumemto：跨泛基因组的高效最大匹配

Mumemto: efficient maximal matching across pangenomes.

作者信息

Shivakumar Vikram S, Langmead Ben

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, USA.

出版信息

Genome Biol. 2025 Jun 17;26(1):169. doi: 10.1186/s13059-025-03644-0.

DOI:10.1186/s13059-025-03644-0

PMID:40528225

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12172372/

Abstract

Aligning genomes into common coordinates is central to pangenome construction, though computationally expensive. Multi-sequence maximal unique matches (multi-MUMs) help to frame and solve the multiple alignment problem. We introduce Mumemto, a tool that computes multi-MUMs and other match types across large pangenomes. Mumemto allows for visualization of synteny, reveals aberrant assemblies and scaffolds, and highlights pangenome conservation and structural variation. Mumemto computes multi-MUMs across 320 human assemblies (960GB) in 25.7 h with 800 GB of memory and hundreds of fungal assemblies in minutes. Mumemto is implemented in C++ and Python and available open-source at https://github.com/vikshiv/mumemto (v1.1.1 at doi.org/10.5281/zenodo.15053447 ).

摘要

将基因组比对到共同的坐标是构建泛基因组的核心，尽管计算成本高昂。多序列最大唯一匹配（multi-MUMs）有助于构建和解决多重比对问题。我们引入了Mumemto，这是一种可在大型泛基因组中计算多MUMs和其他匹配类型的工具。Mumemto允许对共线性进行可视化，揭示异常组装和支架，并突出显示泛基因组的保守性和结构变异。Mumemto使用800GB内存，在25.7小时内可对320个人类基因组组装（960GB）计算多MUMs，对数百个真菌基因组组装只需几分钟。Mumemto用C++和Python实现，可在https://github.com/vikshiv/mumemto（doi.org/10.5281/zenodo.15053447上的v1.1.1版本）开源获取。