在泛基因组规模上生成多个比对。

Generating multiple alignments on a pangenomic scale.

作者信息

Olbrich Jannik, Büchler Thomas, Ohlebusch Enno

机构信息

Institute of Theoretical Computer Science, Ulm University, Ulm, 89069, Germany.

出版信息

Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf104.

DOI:10.1093/bioinformatics/btaf104

PMID:40097267

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11928754/

Abstract

MOTIVATION

Since novel long read sequencing technologies allow for de novo assembly of many individuals of a species, high-quality assemblies are becoming widely available. For example, the recently published draft human pangenome reference was based on assemblies composed of contigs. There is an urgent need for a software-tool that is able to generate a multiple alignment of genomes of the same species because current multiple sequence alignment programs cannot deal with such a volume of data.

RESULTS

We show that the combination of a well-known anchor-based method with the technique of prefix-free parsing yields an approach that is able to generate multiple alignments on a pangenomic scale, provided that large-scale structural variants are rare. Furthermore, experiments with real world data show that our software tool PANgenomic Anchor-based Multiple Alignment significantly outperforms current state-of-the art programs.

AVAILABILITY AND IMPLEMENTATION

Source code is available at: https://gitlab.com/qwerzuiop/panama, archived at swh:1:dir:e90c9f664995acca9063245cabdd97549cf39694.

摘要

动机

由于新型长读长测序技术允许对一个物种的多个个体进行从头组装，高质量的组装结果正变得广泛可用。例如，最近发布的人类泛基因组参考草图就是基于由重叠群组成的组装。迫切需要一种能够生成同一物种基因组多序列比对的软件工具，因为当前的多序列比对程序无法处理如此大量的数据。

结果

我们表明，将一种著名的基于锚定的方法与无前缀解析技术相结合，能产生一种能够在泛基因组规模上生成多序列比对的方法，前提是大规模结构变异很少见。此外，对真实世界数据的实验表明，我们的软件工具基于泛基因组锚定的多序列比对显著优于当前的最先进程序。

可用性与实现

源代码可在以下网址获取：https://gitlab.com/qwerzuiop/panama，存档于swh:1:dir:e90c9f664995acca9063245cabdd97549cf39694。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dd12/11928754/1f9538b1698c/btaf104f1.jpg

相似文献

Generating multiple alignments on a pangenomic scale.

Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf104.

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references.

Bioinformatics. 2014 Jun 15;30(12):i319-i328. doi: 10.1093/bioinformatics/btu291.

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.

BMC Bioinformatics. 2018 Jun 20;19(1):234. doi: 10.1186/s12859-018-2243-x.

wgatools: an ultrafast toolkit for manipulating whole-genome alignments.

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf132.

Mugsy: fast multiple alignment of closely related whole genomes.

Bioinformatics. 2011 Feb 1;27(3):334-42. doi: 10.1093/bioinformatics/btq665. Epub 2010 Dec 9.

Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.

Long read alignment based on maximal exact match seeds.

Bioinformatics. 2012 Sep 15;28(18):i318-i324. doi: 10.1093/bioinformatics/bts414.

chainCleaner improves genome alignment specificity and sensitivity.

Bioinformatics. 2017 Jun 1;33(11):1596-1603. doi: 10.1093/bioinformatics/btx024.

ntLink: A Toolkit for De Novo Genome Assembly Scaffolding and Mapping Using Long Reads.

Curr Protoc. 2023 Apr;3(4):e733. doi: 10.1002/cpz1.733.

CAREx: context-aware read extension of paired-end sequencing data.

BMC Bioinformatics. 2024 May 10;25(1):186. doi: 10.1186/s12859-024-05802-w.

引用本文的文献

Partitioned Multi-MUM finding for scalable pangenomics.

bioRxiv. 2025 May 25:2025.05.20.654611. doi: 10.1101/2025.05.20.654611.

本文引用的文献

Building pangenome graphs.

Nat Methods. 2024 Nov;21(11):2008-2012. doi: 10.1038/s41592-024-02430-3. Epub 2024 Oct 21.

The variation and evolution of complete human centromeres.

Nature. 2024 May;629(8010):136-145. doi: 10.1038/s41586-024-07278-3. Epub 2024 Apr 3.

FMAlign2: a novel fast multiple nucleotide sequence alignment method for ultralong datasets.

Bioinformatics. 2024 Jan 2;40(1). doi: 10.1093/bioinformatics/btae014.

Efficient short read mapping to a pangenome that is represented by a graph of ED strings.

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad320.

A draft human pangenome reference.

Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.

Pangenome graph construction from genome alignments with Minigraph-Cactus.

Nat Biotechnol. 2024 Apr;42(4):663-673. doi: 10.1038/s41587-023-01793-w. Epub 2023 May 10.

Computational graph pangenomics: a tutorial on data structures and their applications.

Nat Comput. 2022 Mar;21(1):81-108. doi: 10.1007/s11047-022-09882-6. Epub 2022 Mar 4.

HAlign 3: Fast Multiple Alignment of Ultra-Large Numbers of Similar DNA/RNA Sequences.

Mol Biol Evol. 2022 Aug 3;39(8). doi: 10.1093/molbev/msac166.

PHONI: Streamed Matching Statistics with Multi-Genome References.

Proc Data Compress Conf. 2021 Mar;2021:193-202. doi: 10.1109/dcc50243.2021.00027. Epub 2021 May 10.

New strategies to improve minimap2 alignment accuracy.

Bioinformatics. 2021 Dec 7;37(23):4572-4574. doi: 10.1093/bioinformatics/btab705.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在泛基因组规模上生成多个比对。

Generating multiple alignments on a pangenomic scale.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性与实现

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献