Suppr超能文献

Mumemto:跨全基因组的高效最大匹配

Mumemto: efficient maximal matching across pangenomes.

作者信息

Shivakumar Vikram S, Langmead Ben

机构信息

Department of Computer Science, Johns Hopkins University.

出版信息

bioRxiv. 2025 Jan 5:2025.01.05.631388. doi: 10.1101/2025.01.05.631388.

Abstract

Aligning genomes into common coordinates is central to pangenome analysis and construction, but it is also computationally expensive. Multi-sequence maximal unique matches (multi-MUMs) are guideposts for core genome alignments, helping to frame and solve the multiple alignment problem. We introduce Mumemto, a tool that computes multi-MUMs and other match types across large pangenomes. Mumemto allows for visualization of synteny, reveals aberrant assemblies and scaffolds, and highlights pangenome conservation and structural variation. Mumemto computes multi-MUMs across 320 human genome assemblies (960GB) in 25.7 hours with under 800 GB of memory, and over hundreds of fungal genome assemblies in minutes. Mumemto is implemented in C++ and Python and available open-source at https://github.com/vikshiv/mumemto.

摘要

将基因组比对到共同的坐标是泛基因组分析和构建的核心,但计算成本也很高。多序列最大唯一匹配(multi-MUMs)是核心基因组比对的路标,有助于构建和解决多重比对问题。我们引入了Mumemto,这是一种可在大型泛基因组中计算multi-MUMs和其他匹配类型的工具。Mumemto允许对共线性进行可视化,揭示异常组装和支架,并突出泛基因组的保守性和结构变异。Mumemto在25.7小时内使用不到800GB的内存就能在320个人类基因组组装(960GB)中计算multi-MUMs,并且能在几分钟内完成数百个真菌基因组组装的计算。Mumemto用C++和Python实现,可在https://github.com/vikshiv/mumemto上开源获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7647/11722392/07b12c3f226b/nihpp-2025.01.05.631388v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验