Suppr超能文献

配对德布鲁因图:一种将配对末端信息整合到基因组组装工具中的新方法。

Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers.

作者信息

Medvedev Paul, Pham Son, Chaisson Mark, Tesler Glenn, Pevzner Pavel

机构信息

Department of Computer Science and Engineering, University of California, San Diego, California, USA.

出版信息

J Comput Biol. 2011 Nov;18(11):1625-34. doi: 10.1089/cmb.2011.0151. Epub 2011 Oct 14.

Abstract

The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated perfect data, we argue that this can effectively improve the contig sizes in assembly.

摘要

近期短读长的新一代测序技术大量涌现,带来了许多新的实验机会,但与此同时,也在基因组组装方面引发了巨大的计算挑战。促使重叠群长度得到改善的关键进展之一是配对末端片段,它有助于重复区域的组装。配对末端片段已通过算法整合到大多数新一代组装程序中,作为各种启发式后处理步骤,用于校正组装图或将重叠群连接成支架。这些方法能够识别出比单读长可能得到的更长的重叠群;然而,它们仍可能无法解析复杂的重复序列。因此,未来改进的整合配对末端片段的方法将对重叠群长度产生重大影响。在此,我们引入了配对德布鲁因图,它是德布鲁因图的一种推广形式,将配对末端片段信息整合到图结构本身,而非在后期处理步骤中分析配对末端片段。这种图有潜力在任何基于德布鲁因图的组装程序中替代德布鲁因图,同时保留所有其他组装步骤,如纠错和重复序列解析。通过对模拟完美数据的组装结果,我们认为这能够有效提高组装中的重叠群大小。

相似文献

3
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
4
Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。
BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.
5
Evaluation of short read metagenomic assembly.短读宏基因组组装评估。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.
7
Safe and Complete Contig Assembly Through Omnitigs.通过全基因组重叠群实现安全且完整的重叠群组装。
J Comput Biol. 2017 Jun;24(6):590-602. doi: 10.1089/cmb.2016.0141. Epub 2016 Oct 17.
9
Assembly of long error-prone reads using de Bruijn graphs.使用德布鲁因图组装长易错读段。
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.

引用本文的文献

1
Buffering updates enables efficient dynamic de Bruijn graphs.缓冲更新可实现高效的动态德布鲁因图。
Comput Struct Biotechnol J. 2021 Jul 6;19:4067-4078. doi: 10.1016/j.csbj.2021.06.047. eCollection 2021.
2
Empirical evaluation of methods for genome assembly.基因组组装方法的实证评估。
PeerJ Comput Sci. 2021 Jul 9;7:e636. doi: 10.7717/peerj-cs.636. eCollection 2021.
10
GRASShopPER-An algorithm for de novo assembly based on GPU alignments.GRASShopPER-一种基于 GPU 比对的从头组装算法。
PLoS One. 2018 Aug 16;13(8):e0202355. doi: 10.1371/journal.pone.0202355. eCollection 2018.

本文引用的文献

1
Assembly of large genomes using second-generation sequencing.使用第二代测序技术进行大基因组组装。
Genome Res. 2010 Sep;20(9):1165-73. doi: 10.1101/gr.101360.109. Epub 2010 May 27.
4
ABySS: a parallel assembler for short read sequence data.ABySS:一种用于短读长序列数据的并行汇编器。
Genome Res. 2009 Jun;19(6):1117-23. doi: 10.1101/gr.089532.108. Epub 2009 Feb 27.
7
Single-molecule DNA sequencing of a viral genome.病毒基因组的单分子DNA测序
Science. 2008 Apr 4;320(5872):106-9. doi: 10.1126/science.1150427.
10
Short read fragment assembly of bacterial genomes.细菌基因组的短读片段组装
Genome Res. 2008 Feb;18(2):324-30. doi: 10.1101/gr.7088808. Epub 2007 Dec 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验