用于可扩展全基因组学的分区多MUM查找

Partitioned Multi-MUM finding for scalable pangenomics.

作者信息

Shivakumar Vikram S, Langmead Ben

机构信息

Department of Computer Science, Johns Hopkins University.

出版信息

bioRxiv. 2025 May 25:2025.05.20.654611. doi: 10.1101/2025.05.20.654611.

DOI:10.1101/2025.05.20.654611

PMID:40475428

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12139944/

Abstract

Pangenome collections are growing to hundreds of high-quality genomes. This necessitates scalable methods for constructing pangenome alignments that can incorporate newly-sequenced assemblies. We previously developed Mumemto, which computes maximal unique matches (multi-MUMs) across pangenomes using compressed indexing. In this work, we extend Mumemto by introducing two new partitioning and merging strategies. Both strategies enable highly parallel, memory efficient, and updateable computation of multi-MUMs. One of the strategies, called string-based merging, is also capable of conducting the merges in a way that follows the shape of a phylogenetic tree, naturally yielding the multi-MUM for the tree's internal nodes as well as the root. With these strategies, Mumemto now scales to 474 human haplotypes, the only multi-MUM method able to do so. It also introduces a time-memory tradeoff that allows Mumemto to be tailored to more scenarios, including in resource-limited settings.

摘要

泛基因组集合正在增长到数百个高质量基因组。这就需要可扩展的方法来构建能够纳入新测序组装体的泛基因组比对。我们之前开发了Mumemto，它使用压缩索引在泛基因组中计算最大唯一匹配（多MUMs）。在这项工作中，我们通过引入两种新的分区和合并策略来扩展Mumemto。这两种策略都能实现多MUMs的高度并行、内存高效且可更新的计算。其中一种策略称为基于字符串的合并，它还能够以遵循系统发育树形状的方式进行合并，自然地生成树内部节点以及根节点的多MUM。有了这些策略，Mumemto现在能够扩展到474个人类单倍型，是唯一能够做到这一点的多MUM方法。它还引入了时间 - 内存权衡，使Mumemto能够适应更多场景，包括资源有限的环境。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/299a/12139944/96928c7e77da/nihpp-2025.05.20.654611v1-f0001.jpg

相似文献

Partitioned Multi-MUM finding for scalable pangenomics.用于可扩展全基因组学的分区多MUM查找

bioRxiv. 2025 May 25:2025.05.20.654611. doi: 10.1101/2025.05.20.654611.

Mumemto: efficient maximal matching across pangenomes.Mumemto：跨全基因组的高效最大匹配

bioRxiv. 2025 Jan 5:2025.01.05.631388. doi: 10.1101/2025.01.05.631388.

Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理：一项网络荟萃分析。

Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.

Mumemto: efficient maximal matching across pangenomes.Mumemto：跨泛基因组的高效最大匹配

Genome Biol. 2025 Jun 17;26(1):169. doi: 10.1186/s13059-025-03644-0.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Adapting Safety Plans for Autistic Adults with Involvement from the Autism Community.在自闭症群体的参与下为成年自闭症患者调整安全计划。

Autism Adulthood. 2025 May 28;7(3):293-302. doi: 10.1089/aut.2023.0124. eCollection 2025 Jun.

Oxycodone for cancer-related pain.羟考酮治疗癌性疼痛。

Cochrane Database Syst Rev. 2022 Jun 9;6(6):CD003870. doi: 10.1002/14651858.CD003870.pub7.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果：面向临床医生的网状Meta分析教程

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

本文引用的文献

Mumemto: efficient maximal matching across pangenomes.Mumemto：跨泛基因组的高效最大匹配

Genome Biol. 2025 Jun 17;26(1):169. doi: 10.1186/s13059-025-03644-0.

Complete sequencing of ape genomes.猿类基因组的完整测序。

Nature. 2025 May;641(8062):401-418. doi: 10.1038/s41586-025-08816-3. Epub 2025 Apr 9.

Generating multiple alignments on a pangenomic scale.在泛基因组规模上生成多个比对。

Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf104.

Building a pangenome alignment index via recursive prefix-free parsing.通过递归无前缀解析构建泛基因组比对索引。

iScience. 2024 Sep 12;27(10):110933. doi: 10.1016/j.isci.2024.110933. eCollection 2024 Oct 18.

The Evolution of Ultraconserved Elements in Vertebrates.脊椎动物中超保守元件的进化。

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae146.

Unlocking plant genetics with telomere-to-telomere genome assemblies.端粒到端粒基因组组装解锁植物遗传学。

Nat Genet. 2024 Sep;56(9):1788-1799. doi: 10.1038/s41588-024-01830-7. Epub 2024 Jul 24.

Parsnp 2.0: scalable core-genome alignment for massive microbial datasets.Parsnp 2.0：适用于大规模微生物数据集的可扩展核心基因组比对工具。

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae311.

Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References.超越人类基因组计划：完整人类基因组序列和泛基因组参考时代。

Annu Rev Genomics Hum Genet. 2024 Aug;25(1):77-104. doi: 10.1146/annurev-genom-021623-081639. Epub 2024 Aug 6.

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range.69 个拟南芥品系的泛基因组揭示了全球物种范围内的保守基因组结构。

Nat Genet. 2024 May;56(5):982-991. doi: 10.1038/s41588-024-01715-9. Epub 2024 Apr 11.

A draft human pangenome reference.人类泛基因组参考草图。

Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于可扩展全基因组学的分区多MUM查找

Partitioned Multi-MUM finding for scalable pangenomics.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献