Suppr超能文献

最大似然基因组组装

Maximum likelihood genome assembly.

作者信息

Medvedev Paul, Brudno Michael

机构信息

Department of Computer Science, University of Toronto , Toronto, Canada.

出版信息

J Comput Biol. 2009 Aug;16(8):1101-16. doi: 10.1089/cmb.2009.0047.

Abstract

Whole genome shotgun assembly is the process of taking many short sequenced segments (reads) and reconstructing the genome from which they originated. We demonstrate how the technique of bidirected network flow can be used to explicitly model the double-stranded nature of DNA for genome assembly. By combining an algorithm for the Chinese Postman Problem on bidirected graphs with the construction of a bidirected de Bruijn graph, we are able to find the shortest double-stranded DNA sequence that contains a given set of k-long DNA molecules. This is the first exact polynomial time algorithm for the assembly of a double-stranded genome. Furthermore, we propose a maximum likelihood framework for assembling the genome that is the most likely source of the reads, in lieu of the standard maximum parsimony approach (which finds the shortest genome subject to some constraints). In this setting, we give a bidirected network flow-based algorithm that, by taking advantage of high coverage, accurately estimates the copy counts of repeats in a genome. Our second algorithm combines these predicted copy counts with matepair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from Escherichia coli and predict copy counts with extremely high accuracy, while assembling long contigs.

摘要

全基因组鸟枪法组装是将许多短测序片段(读段)进行拼接,并重建其来源基因组的过程。我们展示了如何使用双向网络流技术来明确地为基因组组装建模DNA的双链性质。通过将双向图上的中国邮递员问题算法与双向德布鲁因图的构建相结合,我们能够找到包含给定一组k长度DNA分子的最短双链DNA序列。这是首个用于双链基因组组装的精确多项式时间算法。此外,我们提出了一个用于组装读段最可能来源基因组的最大似然框架,以替代标准的最大简约方法(该方法在某些约束条件下找到最短基因组)。在此框架下,我们给出了一种基于双向网络流的算法,该算法通过利用高覆盖率,准确估计基因组中重复序列的拷贝数。我们的第二个算法将这些预测的拷贝数与配对末端数据相结合,以便将读段组装成重叠群。我们在来自大肠杆菌的模拟读段数据上运行我们的算法,在组装长重叠群的同时,以极高的准确率预测拷贝数。

相似文献

1
Maximum likelihood genome assembly.最大似然基因组组装
J Comput Biol. 2009 Aug;16(8):1101-16. doi: 10.1089/cmb.2009.0047.
4
Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。
BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.
5
FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch:一种基于草图的快速基因组装配器。
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.
6
Assembly of long error-prone reads using de Bruijn graphs.使用德布鲁因图组装长易错读段。
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.
10

引用本文的文献

3
8
A Sequence Distance Graph framework for genome assembly and analysis.用于基因组组装和分析的序列距离图框架。
F1000Res. 2019 Aug 23;8:1490. doi: 10.12688/f1000research.20233.1. eCollection 2019.
10
New approaches for metagenome assembly with short reads.基于短读长的宏基因组组装新方法
Brief Bioinform. 2020 Mar 23;21(2):584-594. doi: 10.1093/bib/bbz020.

本文引用的文献

4
Short read fragment assembly of bacterial genomes.细菌基因组的短读片段组装
Genome Res. 2008 Feb;18(2):324-30. doi: 10.1101/gr.7088808. Epub 2007 Dec 14.
6
Extending assembly of short DNA sequences to handle error.扩展短DNA序列的组装以处理错误。
Bioinformatics. 2007 Nov 1;23(21):2942-4. doi: 10.1093/bioinformatics/btm451. Epub 2007 Sep 24.
7
Assembling millions of short DNA sequences using SSAKE.使用SSAKE组装数百万条短DNA序列。
Bioinformatics. 2007 Feb 15;23(4):500-1. doi: 10.1093/bioinformatics/btl629. Epub 2006 Dec 8.
8
The fragment assembly string graph.片段组装字符串图。
Bioinformatics. 2005 Sep 1;21 Suppl 2:ii79-85. doi: 10.1093/bioinformatics/bti1114.
10
Fragment assembly with short reads.使用短读段进行片段组装。
Bioinformatics. 2004 Sep 1;20(13):2067-74. doi: 10.1093/bioinformatics/bth205. Epub 2004 Apr 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验