Wick Ryan R, Judd Louise M, Gorrie Claire L, Holt Kathryn E
Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Victoria, Australia.
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.
The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate "hybrid" assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.
Illumina DNA测序平台能生成准确但较短的 reads,可用于生成准确但碎片化的基因组组装。太平洋生物科学公司(Pacific Biosciences)和牛津纳米孔技术公司(Oxford Nanopore Technologies)的DNA测序平台能生成可产生完整基因组组装的长 reads,但测序成本更高且更容易出错。人们对结合这些互补测序技术的数据以生成更准确的“混合”组装有着浓厚兴趣。然而,真正利用这两种数据优势(即短 reads 的准确性和长 reads 的结构解析能力)的工具却很少。在此,我们展示了Unicycler,这是一种用于从短 reads 和长 reads 的组合中组装细菌基因组的新工具,它能产生准确、完整且经济高效的组装。Unicycler 使用从头组装器SPAdes从短 reads 构建初始组装图,然后利用短 reads 和长 reads 的信息简化该图。Unicycler 使用一种新颖的半全局比对器将长 reads 比对到组装图上。对合成 reads 和真实 reads 的测试表明,即使长 reads 的深度和准确性较低,Unicycler 也能比其他混合组装器组装出更大的重叠群且错误组装更少。Unicycler 是开源的(GPLv3),可在github.com/rrwick/Unicycler上获取。