一种用于泛基因组分析的压缩德布鲁因图表示法，可实现搜索。

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.

作者信息

Beller Timo, Ohlebusch Enno

机构信息

Institute of Theoretical Computer Science, Ulm University, James-Franck-Ring O27/537, 89069 Ulm, Germany.

出版信息

Algorithms Mol Biol. 2016 Jul 18;11:20. doi: 10.1186/s13015-016-0083-7. eCollection 2016.

DOI:10.1186/s13015-016-0083-7

PMID:27437028

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4950428/

Abstract

BACKGROUND

Recently, Marcus et al. (Bioinformatics 30:3476-83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497-504, 2016) improved their result.

RESULTS

In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele-a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures.

摘要

背景

最近，马库斯等人（《生物信息学》30:3476 - 83，2014年）提议使用压缩德布鲁因图来描述同一或密切相关物种的多个个体/菌株的基因组之间的关系。他们设计了一种名为splitMEM的[公式：见正文]时间算法，该算法基于后缀树直接构建此图（即不使用未压缩的德布鲁因图），其中n是基因组的总长度，g是最长基因组的长度。拜尔等人（《生物信息学》32:497 - 504，2016年）改进了他们的结果。

结果

在本文中，我们提出了一种新的压缩德布鲁因图的空间高效表示方法，该方法增加了在泛基因组内搜索模式（例如等位基因——基因的一种变异形式）的可能性。在泛基因组图内进行搜索的能力至关重要，并且是泛基因组数据结构的一个设计目标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7a8/4950428/c646c066de43/13015_2016_83_Fig1_HTML.jpg

相似文献

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.

Algorithms Mol Biol. 2016 Jul 18;11:20. doi: 10.1186/s13015-016-0083-7. eCollection 2016.

Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform.

Bioinformatics. 2016 Feb 15;32(4):497-504. doi: 10.1093/bioinformatics/btv603. Epub 2015 Oct 26.

Buffering updates enables efficient dynamic de Bruijn graphs.

Comput Struct Biotechnol J. 2021 Jul 6;19:4067-4078. doi: 10.1016/j.csbj.2021.06.047. eCollection 2021.

SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips.

Bioinformatics. 2014 Dec 15;30(24):3476-83. doi: 10.1093/bioinformatics/btu756. Epub 2014 Nov 13.

Pan-genome de Bruijn graph using the bidirectional FM-index.

BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6.

Simplitigs as an efficient and scalable representation of de Bruijn graphs.

Genome Biol. 2021 Apr 6;22(1):96. doi: 10.1186/s13059-021-02297-z.

A space and time-efficient index for the compacted colored de Bruijn graph.

Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.

deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2157-2166. doi: 10.1109/TCBB.2019.2913932. Epub 2021 Dec 8.

Erratum to: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.

Algorithms Mol Biol. 2016 Nov 28;11:28. doi: 10.1186/s13015-016-0090-8. eCollection 2016.

Integrating long-range connectivity information into de Bruijn graphs.

Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.

引用本文的文献

When less is more: sketching with minimizers in genomics.

Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.

Pan-genome de Bruijn graph using the bidirectional FM-index.

BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6.

Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes.

Nat Methods. 2023 Aug;20(8):1213-1221. doi: 10.1038/s41592-023-01914-y. Epub 2023 Jun 26.

The design and construction of reference pangenome graphs with minigraph.

Genome Biol. 2020 Oct 16;21(1):265. doi: 10.1186/s13059-020-02168-z.

Design and evaluation of a sequence capture system for genome-wide SNP genotyping in highly heterozygous plant genomes: a case study with a keystone Neotropical hardwood tree genome.

DNA Res. 2018 Oct 1;25(5):535-545. doi: 10.1093/dnares/dsy023.

A space and time-efficient index for the compacted colored de Bruijn graph.

Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.

seq-seq-pan: building a computational pan-genome data structure on whole genome alignment.

BMC Genomics. 2018 Jan 15;19(1):47. doi: 10.1186/s12864-017-4401-3.

Erratum to: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.

Algorithms Mol Biol. 2016 Nov 28;11:28. doi: 10.1186/s13015-016-0090-8. eCollection 2016.

本文引用的文献

Computational pan-genomics: status, promises and challenges.

Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089.

Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage.

Algorithms Mol Biol. 2016 Apr 14;11:3. doi: 10.1186/s13015-016-0066-8. eCollection 2016.

Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform.

Bioinformatics. 2016 Feb 15;32(4):497-504. doi: 10.1093/bioinformatics/btv603. Epub 2015 Oct 26.

Improved genome inference in the MHC using a population reference graph.

Nat Genet. 2015 Jun;47(6):682-8. doi: 10.1038/ng.3257. Epub 2015 Apr 27.

SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips.

Bioinformatics. 2014 Dec 15;30(24):3476-83. doi: 10.1093/bioinformatics/btu756. Epub 2014 Nov 13.

Journaled string tree-a scalable data structure for analyzing thousands of similar genomes on your laptop.

Bioinformatics. 2014 Dec 15;30(24):3499-505. doi: 10.1093/bioinformatics/btu438. Epub 2014 Jul 15.

Short read alignment with populations of genomes.

Bioinformatics. 2013 Jul 1;29(13):i361-70. doi: 10.1093/bioinformatics/btt215.

De novo assembly and genotyping of variants using colored de Bruijn graphs.

Nat Genet. 2012 Jan 8;44(2):226-32. doi: 10.1038/ng.1028.

AlleleSeq: analysis of allele-specific expression and binding in a network framework.

Mol Syst Biol. 2011 Aug 2;7:522. doi: 10.1038/msb.2011.54.

Simultaneous alignment of short reads against multiple genomes.

Genome Biol. 2009;10(9):R98. doi: 10.1186/gb-2009-10-9-r98. Epub 2009 Sep 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

一种用于泛基因组分析的压缩德布鲁因图表示法，可实现搜索。

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.

作者信息

机构信息