Suppr超能文献

纳米孔测序技术和基因组组装工具:当前状态、瓶颈和未来方向的计算分析。

Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions.

机构信息

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, USA.

Department of Computer Science, Systems Group, ETH Zürich, Zürich, Switzerland.

出版信息

Brief Bioinform. 2019 Jul 19;20(4):1542-1559. doi: 10.1093/bib/bby017.

Abstract

Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.

摘要

纳米孔测序技术具有生成长读段和提供便携性的能力,有望使其他测序技术过时。然而,该技术的高错误率在生成准确的基因组组装时带来了挑战。用于纳米孔序列分析的工具至关重要,因为它们应该克服该技术的高错误率。我们的目标是全面分析当前可用的纳米孔序列分析工具,以了解它们的优势、劣势和性能瓶颈。了解当前工具的不足之处对于开发更好的工具非常重要。为此,我们:(1) 分析使用纳米孔序列数据的基因组组装管道中的多个步骤和相关工具;(2) 为每个步骤确定合适工具提供指导原则。基于我们的分析,我们得出了四个关键观察结果:(1) 碱基调用工具的选择对于克服纳米孔测序技术的高错误率起着至关重要的作用。(2) Read-to-read 重叠发现工具 GraphMap 和 Minimap 在准确性方面表现相似。然而,Minimap 的内存使用率较低,并且比 GraphMap 更快。(3) 在决定组装步骤的适当工具时,需要在准确性和性能之间进行权衡。快速但准确性较低的装配器 Miniasm 可用于快速初始装配,然后在其上应用进一步的抛光以提高准确性,从而加快整体装配速度。(4) 最先进的抛光工具 Racon 在提供比另一个抛光工具 Nanopolish 更快的速度的同时生成高质量的共识序列。我们分析了不同工具的各种组合,并揭示了准确性、性能、内存使用和可扩展性之间的权衡。我们得出的结论是,我们的观察结果可以指导研究人员和从业人员在使用纳米孔序列数据进行基因组组装管道的每个步骤时做出有意识和有效的选择。此外,借助我们发现的瓶颈,开发人员可以改进当前的工具或构建新的既准确又快速的工具,以克服纳米孔测序技术的高错误率。

相似文献

引用本文的文献

本文引用的文献

5
The long reads ahead: genome assembly using the MinION.未来的长读长测序:使用MinION进行基因组组装。
F1000Res. 2017 Jul 7;6:1083. doi: 10.12688/f1000research.12012.2. eCollection 2017.
6
DNA sequencing at 40: past, present and future.DNA 测序 40 年:过去、现在与未来。
Nature. 2017 Oct 19;550(7676):345-353. doi: 10.1038/nature24286. Epub 2017 Oct 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验