Olbrich Jannik, Büchler Thomas, Ohlebusch Enno
Institute of Theoretical Computer Science, Ulm University, Ulm, 89069, Germany.
Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf104.
Since novel long read sequencing technologies allow for de novo assembly of many individuals of a species, high-quality assemblies are becoming widely available. For example, the recently published draft human pangenome reference was based on assemblies composed of contigs. There is an urgent need for a software-tool that is able to generate a multiple alignment of genomes of the same species because current multiple sequence alignment programs cannot deal with such a volume of data.
We show that the combination of a well-known anchor-based method with the technique of prefix-free parsing yields an approach that is able to generate multiple alignments on a pangenomic scale, provided that large-scale structural variants are rare. Furthermore, experiments with real world data show that our software tool PANgenomic Anchor-based Multiple Alignment significantly outperforms current state-of-the art programs.
Source code is available at: https://gitlab.com/qwerzuiop/panama, archived at swh:1:dir:e90c9f664995acca9063245cabdd97549cf39694.
由于新型长读长测序技术允许对一个物种的多个个体进行从头组装,高质量的组装结果正变得广泛可用。例如,最近发布的人类泛基因组参考草图就是基于由重叠群组成的组装。迫切需要一种能够生成同一物种基因组多序列比对的软件工具,因为当前的多序列比对程序无法处理如此大量的数据。
我们表明,将一种著名的基于锚定的方法与无前缀解析技术相结合,能产生一种能够在泛基因组规模上生成多序列比对的方法,前提是大规模结构变异很少见。此外,对真实世界数据的实验表明,我们的软件工具基于泛基因组锚定的多序列比对显著优于当前的最先进程序。
源代码可在以下网址获取:https://gitlab.com/qwerzuiop/panama,存档于swh:1:dir:e90c9f664995acca9063245cabdd97549cf39694。