Suppr超能文献

一种用于泛基因组分析的压缩德布鲁因图表示法,可实现搜索。

A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.

作者信息

Beller Timo, Ohlebusch Enno

机构信息

Institute of Theoretical Computer Science, Ulm University, James-Franck-Ring O27/537, 89069 Ulm, Germany.

出版信息

Algorithms Mol Biol. 2016 Jul 18;11:20. doi: 10.1186/s13015-016-0083-7. eCollection 2016.

Abstract

BACKGROUND

Recently, Marcus et al. (Bioinformatics 30:3476-83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related species. They devised an [Formula: see text] time algorithm called splitMEM that constructs this graph directly (i.e., without using the uncompressed de Bruijn graph) based on a suffix tree, where n is the total length of the genomes and g is the length of the longest genome. Baier et al. (Bioinformatics 32:497-504, 2016) improved their result.

RESULTS

In this paper, we propose a new space-efficient representation of the compressed de Bruijn graph that adds the possibility to search for a pattern (e.g. an allele-a variant form of a gene) within the pan-genome. The ability to search within the pan-genome graph is of utmost importance and is a design goal of pan-genome data structures.

摘要

背景

最近,马库斯等人(《生物信息学》30:3476 - 83,2014年)提议使用压缩德布鲁因图来描述同一或密切相关物种的多个个体/菌株的基因组之间的关系。他们设计了一种名为splitMEM的[公式:见正文]时间算法,该算法基于后缀树直接构建此图(即不使用未压缩的德布鲁因图),其中n是基因组的总长度,g是最长基因组的长度。拜尔等人(《生物信息学》32:497 - 504,2016年)改进了他们的结果。

结果

在本文中,我们提出了一种新的压缩德布鲁因图的空间高效表示方法,该方法增加了在泛基因组内搜索模式(例如等位基因——基因的一种变异形式)的可能性。在泛基因组图内进行搜索的能力至关重要,并且是泛基因组数据结构的一个设计目标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7a8/4950428/c646c066de43/13015_2016_83_Fig1_HTML.jpg

相似文献

1
A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.
Algorithms Mol Biol. 2016 Jul 18;11:20. doi: 10.1186/s13015-016-0083-7. eCollection 2016.
2
Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform.
Bioinformatics. 2016 Feb 15;32(4):497-504. doi: 10.1093/bioinformatics/btv603. Epub 2015 Oct 26.
3
Buffering updates enables efficient dynamic de Bruijn graphs.
Comput Struct Biotechnol J. 2021 Jul 6;19:4067-4078. doi: 10.1016/j.csbj.2021.06.047. eCollection 2021.
4
SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips.
Bioinformatics. 2014 Dec 15;30(24):3476-83. doi: 10.1093/bioinformatics/btu756. Epub 2014 Nov 13.
5
Pan-genome de Bruijn graph using the bidirectional FM-index.
BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6.
6
Simplitigs as an efficient and scalable representation of de Bruijn graphs.
Genome Biol. 2021 Apr 6;22(1):96. doi: 10.1186/s13059-021-02297-z.
7
A space and time-efficient index for the compacted colored de Bruijn graph.
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
8
deGSM: Memory Scalable Construction Of Large Scale de Bruijn Graph.
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2157-2166. doi: 10.1109/TCBB.2019.2913932. Epub 2021 Dec 8.
9
Erratum to: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.
Algorithms Mol Biol. 2016 Nov 28;11:28. doi: 10.1186/s13015-016-0090-8. eCollection 2016.
10
Integrating long-range connectivity information into de Bruijn graphs.
Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.

引用本文的文献

1
When less is more: sketching with minimizers in genomics.
Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.
2
Pan-genome de Bruijn graph using the bidirectional FM-index.
BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6.
3
Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes.
Nat Methods. 2023 Aug;20(8):1213-1221. doi: 10.1038/s41592-023-01914-y. Epub 2023 Jun 26.
4
The design and construction of reference pangenome graphs with minigraph.
Genome Biol. 2020 Oct 16;21(1):265. doi: 10.1186/s13059-020-02168-z.
6
A space and time-efficient index for the compacted colored de Bruijn graph.
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
7
seq-seq-pan: building a computational pan-genome data structure on whole genome alignment.
BMC Genomics. 2018 Jan 15;19(1):47. doi: 10.1186/s12864-017-4401-3.
8
Erratum to: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search.
Algorithms Mol Biol. 2016 Nov 28;11:28. doi: 10.1186/s13015-016-0090-8. eCollection 2016.

本文引用的文献

1
Computational pan-genomics: status, promises and challenges.
Brief Bioinform. 2018 Jan 1;19(1):118-135. doi: 10.1093/bib/bbw089.
2
Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage.
Algorithms Mol Biol. 2016 Apr 14;11:3. doi: 10.1186/s13015-016-0066-8. eCollection 2016.
3
Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform.
Bioinformatics. 2016 Feb 15;32(4):497-504. doi: 10.1093/bioinformatics/btv603. Epub 2015 Oct 26.
4
Improved genome inference in the MHC using a population reference graph.
Nat Genet. 2015 Jun;47(6):682-8. doi: 10.1038/ng.3257. Epub 2015 Apr 27.
5
SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips.
Bioinformatics. 2014 Dec 15;30(24):3476-83. doi: 10.1093/bioinformatics/btu756. Epub 2014 Nov 13.
6
Journaled string tree-a scalable data structure for analyzing thousands of similar genomes on your laptop.
Bioinformatics. 2014 Dec 15;30(24):3499-505. doi: 10.1093/bioinformatics/btu438. Epub 2014 Jul 15.
7
Short read alignment with populations of genomes.
Bioinformatics. 2013 Jul 1;29(13):i361-70. doi: 10.1093/bioinformatics/btt215.
8
De novo assembly and genotyping of variants using colored de Bruijn graphs.
Nat Genet. 2012 Jan 8;44(2):226-32. doi: 10.1038/ng.1028.
9
AlleleSeq: analysis of allele-specific expression and binding in a network framework.
Mol Syst Biol. 2011 Aug 2;7:522. doi: 10.1038/msb.2011.54.
10
Simultaneous alignment of short reads against multiple genomes.
Genome Biol. 2009;10(9):R98. doi: 10.1186/gb-2009-10-9-r98. Epub 2009 Sep 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验