Suppr超能文献

利用“可靠”重叠区域改进基于Phrap的大鼠基因组组装。

Improving Phrap-based assembly of the rat using "reliable" overlaps.

作者信息

Roberts Michael, Zimin Aleksey V, Hayes Wayne, Hunt Brian R, Ustun Cevat, White James R, Havlak Paul, Yorke James

机构信息

Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America.

出版信息

PLoS One. 2008 Mar 19;3(3):e1836. doi: 10.1371/journal.pone.0001836.

Abstract

The assembly methods used for whole-genome shotgun (WGS) data have a major impact on the quality of resulting draft genomes. We present a novel algorithm to generate a set of "reliable" overlaps based on identifying repeat k-mers. To demonstrate the benefits of using reliable overlaps, we have created a version of the Phrap assembly program that uses only overlaps from a specific list. We call this version PhrapUMD. Integrating PhrapUMD and our "reliable-overlap" algorithm with the Baylor College of Medicine assembler, Atlas, we assemble the BACs from the Rattus norvegicus genome project. Starting with the same data as the Nov. 2002 Atlas assembly, we compare our results and the Atlas assembly to the 4.3 Mb of rat sequence in the 21 BACs that have been finished. Our version of the draft assembly of the 21 BACs increases the coverage of finished sequence from 93.4% to 96.3%, while simultaneously reducing the base error rate from 4.5 to 1.1 errors per 10,000 bases. There are a number of ways of assessing the relative merits of assemblies when the finished sequence is available. If one views the overall quality of an assembly as proportional to the inverse of the product of the error rate and sequence missed, then the assembly presented here is seven times better. The UMD Overlapper with options for reliable overlaps is available from the authors at http://www.genome.umd.edu. We also provide the changes to the Phrap source code enabling it to use only the reliable overlaps.

摘要

用于全基因组鸟枪法(WGS)数据的组装方法对所得草图基因组的质量有重大影响。我们提出了一种新算法,通过识别重复k-mer来生成一组“可靠的”重叠区域。为了证明使用可靠重叠区域的好处,我们创建了一个Phrap组装程序版本,该版本仅使用特定列表中的重叠区域。我们将此版本称为PhrapUMD。将PhrapUMD和我们的“可靠重叠”算法与贝勒医学院的组装程序Atlas相结合,我们对褐家鼠基因组计划中的BAC进行了组装。从与2002年11月Atlas组装相同的数据开始,我们将我们的结果和Atlas组装与已完成的21个BAC中4.3 Mb的大鼠序列进行比较。我们版本的21个BAC草图组装将已完成序列的覆盖率从93.4%提高到了96.3%,同时将碱基错误率从每10000个碱基4.5个错误降低到了1.1个错误。当有完成的序列时,有多种评估组装相对优点的方法。如果将组装的整体质量视为与错误率和遗漏序列的乘积的倒数成正比,那么这里展示的组装质量提高了七倍。作者提供了带有可靠重叠选项的UMD Overlapper,可从http://www.genome.umd.edu获取。我们还提供了对Phrap源代码的修改,使其能够仅使用可靠的重叠区域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0eaf/2266800/e1d8fab9b034/pone.0001836.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验