Medical Faculty, Institute for Medical Biometry and Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf 40225, Germany.
Helmholtz Centre for Infection Research (HZI), Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken 66123, Germany.
Bioinformatics. 2022 Sep 2;38(17):4217-4219. doi: 10.1093/bioinformatics/btac448.
With the fast development of sequencing technology, accurate de novo genome assembly is now possible even for larger genomes. Graph-based representations of genomes arise both as part of the assembly process, but also in the context of pangenomes representing a population. In both cases, polymorphic loci lead to bubble structures in such graphs. Detecting bubbles is hence an important task when working with genomic variants in the context of genome graphs.
Here, we present a fast general-purpose tool, called BubbleGun, for detecting bubbles and superbubbles in genome graphs. Furthermore, BubbleGun detects and outputs runs of linearly connected bubbles and superbubbles, which we call bubble chains. We showcase its utility on de Bruijn graphs and compare our results to vg's snarl detection. We show that BubbleGun is considerably faster than vg especially in bigger graphs, where it reports all bubbles in less than 30 min on a human sample de Bruijn graph of around 2 million nodes.
BubbleGun is available and documented as a Python3 package at https://github.com/fawaz-dabbaghieh/bubble_gun under MIT license.
Supplementary data are available at Bioinformatics online.
随着测序技术的快速发展,即使是较大的基因组,现在也可以进行准确的从头基因组组装。基于图的基因组表示法既出现在组装过程中,也出现在代表群体的泛基因组上下文中。在这两种情况下,多态性位点都会导致此类图中的气泡结构。因此,在基因组图上下文中处理基因组变体时,检测气泡是一项重要任务。
在这里,我们提出了一种快速的通用工具,称为 BubbleGun,用于检测基因组图中的气泡和超级气泡。此外,BubbleGun 还可以检测和输出线性连接的气泡和超级气泡的运行,我们称之为气泡链。我们在 de Bruijn 图上展示了它的实用性,并将我们的结果与 vg 的 snarl 检测进行了比较。我们表明,BubbleGun 比 vg 快得多,尤其是在更大的图中,它可以在不到 30 分钟的时间内报告人类样本 de Bruijn 图中大约 200 万个节点的所有气泡。
BubbleGun 是一个 Python3 包,可在 MIT 许可证下在 https://github.com/fawaz-dabbaghieh/bubble_gun 上获得和记录。
补充数据可在 Bioinformatics 在线获得。