Suppr超能文献

单轮循环器:从短读长和长读长测序数据中解析细菌基因组组装结果

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.

作者信息

Wick Ryan R, Judd Louise M, Gorrie Claire L, Holt Kathryn E

机构信息

Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Victoria, Australia.

出版信息

PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.

Abstract

The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.

摘要

Illumina DNA测序平台能生成准确但较短的 reads,可用于生成准确但碎片化的基因组组装。太平洋生物科学公司(Pacific Biosciences)和牛津纳米孔技术公司(Oxford Nanopore Technologies)的DNA测序平台能生成可产生完整基因组组装的长 reads,但测序成本更高且更容易出错。人们对结合这些互补测序技术的数据以生成更准确的“混合”组装有着浓厚兴趣。然而,真正利用这两种数据优势(即短 reads 的准确性和长 reads 的结构解析能力)的工具却很少。在此,我们展示了Unicycler,这是一种用于从短 reads 和长 reads 的组合中组装细菌基因组的新工具,它能产生准确、完整且经济高效的组装。Unicycler 使用从头组装器SPAdes从短 reads 构建初始组装图,然后利用短 reads 和长 reads 的信息简化该图。Unicycler 使用一种新颖的半全局比对器将长 reads 比对到组装图上。对合成 reads 和真实 reads 的测试表明,即使长 reads 的深度和准确性较低,Unicycler 也能比其他混合组装器组装出更大的重叠群且错误组装更少。Unicycler 是开源的(GPLv3),可在github.com/rrwick/Unicycler上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cd3/5481147/27558f3ef2bc/pcbi.1005595.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验