使用双管数据进行片段组装。

Fragment assembly with double-barreled data.

作者信息

Pevzner P A, Tang H

机构信息

Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093, USA.

出版信息

Bioinformatics. 2001;17 Suppl 1:S225-33. doi: 10.1093/bioinformatics/17.suppl_1.s225.

DOI:10.1093/bioinformatics/17.suppl_1.s225

PMID:11473013

Abstract

For the last twenty years fragment assembly was dominated by the "overlap - layout - consensus" algorithms that are used in all currently available assembly tools. However, the limits of these algorithms are being tested in the era of genomic sequencing and it is not clear whether they are the best choice for large-scale assemblies. Although the "overlap - layout - consensus" approach proved to be useful in assembling clones, it faces difficulties in genomic assemblies: the existing algorithms make assembly errors even in bacterial genomes. We abandoned the "overlap - layout - consensus" approach in favour of a new Eulerian Superpath approach that outperforms the existing algorithms for genomic fragment assembly (Pevzner et al. 2001 InProceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB-01), 256-26). In this paper we describe our new EULER-DB algorithm that, similarly to the Celera assembler takes advantage of clone-end sequencing by using the double-barreled data. However, in contrast to the Celera assembler, EULER-DB does not mask repeats but uses them instead as a powerful tool for contig ordering. We also describe a new approach for the Copy Number Problem: "How many times a given repeat is present in the genome?". For long nearly-perfect repeats this question is notoriously difficult and some copies of such repeats may be "lost" in genomic assemblies. We describe our EULER-CN algorithm for the Copy Number Problem that proved to be successful in difficult sequencing projects.

摘要

在过去的二十年里，片段组装一直由“重叠-布局-共识”算法主导，所有当前可用的组装工具都使用这种算法。然而，在基因组测序时代，这些算法的局限性正在受到考验，而且它们是否是大规模组装的最佳选择尚不清楚。尽管“重叠-布局-共识”方法在克隆组装中被证明是有用的，但它在基因组组装中面临困难：现有的算法即使在细菌基因组组装中也会产生错误。我们放弃了“重叠-布局-共识”方法，转而采用一种新的欧拉超级路径方法，该方法在基因组片段组装方面优于现有算法（佩夫兹纳等人，2001年，《第五届计算分子生物学年度国际会议论文集》（RECOMB-01），第256-26页）。在本文中，我们描述了我们的新EULER-DB算法，与Celera组装器类似，该算法通过使用双管数据利用克隆末端测序。然而，与Celera组装器不同的是，EULER-DB不屏蔽重复序列，而是将它们用作重叠群排序的强大工具。我们还描述了一种解决拷贝数问题的新方法：“基因组中给定的重复序列出现了多少次？”。对于长的近乎完美的重复序列，这个问题非常困难，而且这种重复序列的一些拷贝可能会在基因组组装中“丢失”。我们描述了我们用于拷贝数问题的EULER-CN算法，该算法在困难的测序项目中被证明是成功的。

相似文献

Fragment assembly with double-barreled data.

Bioinformatics. 2001;17 Suppl 1:S225-33. doi: 10.1093/bioinformatics/17.suppl_1.s225.

An Eulerian path approach to DNA fragment assembly.

Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53. doi: 10.1073/pnas.171285098.

Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijn-graph.

Brief Funct Genomics. 2012 Jan;11(1):25-37. doi: 10.1093/bfgp/elr035. Epub 2011 Dec 19.

Correcting base-assignment errors in repeat regions of shotgun assembly.

IEEE/ACM Trans Comput Biol Bioinform. 2007 Jan-Mar;4(1):54-64. doi: 10.1109/TCBB.2007.1005.

Assembly algorithms for next-generation sequencing data.

Genomics. 2010 Jun;95(6):315-27. doi: 10.1016/j.ygeno.2010.03.001. Epub 2010 Mar 6.

Information-optimal genome assembly via sparse read-overlap graphs.

Bioinformatics. 2016 Sep 1;32(17):i494-i502. doi: 10.1093/bioinformatics/btw450.

Whole-genome sequencing and assembly with high-throughput, short-read technologies.

PLoS One. 2007 May 30;2(5):e484. doi: 10.1371/journal.pone.0000484.

Sequencing by hybridization in the presence of hybridization errors.

Genome Inform Ser Workshop Genome Inform. 2000;11:53-62.

HISEA: HIerarchical SEed Aligner for PacBio data.

BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.

Consensus generation and variant detection by Celera Assembler.

Bioinformatics. 2008 Apr 15;24(8):1035-40. doi: 10.1093/bioinformatics/btn074. Epub 2008 Mar 4.

引用本文的文献

SuperPATH-Current Status of Evidence and Further Investigations: A Scoping Review and Quality Assessment.

J Clin Med. 2023 Aug 19;12(16):5395. doi: 10.3390/jcm12165395.

MetaCRS: unsupervised clustering of contigs with the recursive strategy of reducing metagenomic dataset's complexity.

BMC Bioinformatics. 2022 Jan 20;22(Suppl 12):315. doi: 10.1186/s12859-021-04227-z.

Empirical evaluation of methods for genome assembly.

PeerJ Comput Sci. 2021 Jul 9;7:e636. doi: 10.7717/peerj-cs.636. eCollection 2021.

Accurate determination of node and arc multiplicities in de bruijn graphs using conditional random fields.

BMC Bioinformatics. 2020 Sep 14;21(1):402. doi: 10.1186/s12859-020-03740-x.

An Efficient, Scalable, and Exact Representation of High-Dimensional Color Information Enabled Using de Bruijn Graph Search.

J Comput Biol. 2020 Apr;27(4):485-499. doi: 10.1089/cmb.2019.0322. Epub 2020 Mar 16.

SRAssembler: Selective Recursive local Assembly of homologous genomic regions.

BMC Bioinformatics. 2019 Jul 2;20(1):371. doi: 10.1186/s12859-019-2949-4.

HINGE: long-read assembly achieves optimal repeat resolution.

Genome Res. 2017 May;27(5):747-756. doi: 10.1101/gr.216465.116. Epub 2017 Mar 20.

Spaced Seed Data Structures for De Novo Assembly.

Int J Genomics. 2015;2015:196591. doi: 10.1155/2015/196591. Epub 2015 Oct 11.

MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.

PeerJ. 2015 Aug 27;3:e1165. doi: 10.7717/peerj.1165. eCollection 2015.

Next-generation sequence assembly: four stages of data processing and computational challenges.

PLoS Comput Biol. 2013;9(12):e1003345. doi: 10.1371/journal.pcbi.1003345. Epub 2013 Dec 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用双管数据进行片段组装。

Fragment assembly with double-barreled data.

作者信息

Pevzner P A, Tang H

机构信息

Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA 92093, USA.

出版信息

Bioinformatics. 2001;17 Suppl 1:S225-33. doi: 10.1093/bioinformatics/17.suppl_1.s225.

DOI:10.1093/bioinformatics/17.suppl_1.s225

PMID:11473013

Abstract

摘要

使用双管数据进行片段组装。

Fragment assembly with double-barreled data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用双管数据进行片段组装。

Fragment assembly with double-barreled data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献