Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, British Columbia.
Curr Protoc. 2023 Apr;3(4):e733. doi: 10.1002/cpz1.733.
With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based mappings to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling, and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based mappings that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: ntLink scaffolding using overlap detection Basic Protocol 2: ntLink scaffolding with gap-filling Basic Protocol 3: Running in-code iterations of ntLink scaffolding Alternate Protocol 1: Generating long-read to contig mappings with ntLink Alternate Protocol 2: Using ntLink mappings for genome assembly correction with Tigmint-long Support Protocol: Installing ntLink.
随着基因组测序数据的可负担性和可及性的提高,从头基因组组装是进行各种下游研究和分析的重要第一步。因此,能够以计算高效的方式生成高质量基因组组装的生物信息学工具是必不可少的。长读测序技术的最新发展极大地促进了基因组组装工作,包括支架构建,为解决复杂基因组中具有挑战性的重复区域提供了长程证据。ntLink 是一种灵活且资源高效的基因组支架构建工具,它利用长读测序数据来改进基于任何测序技术(包括相同的长读)构建的草稿基因组组装。ntLink 不是使用读取比对来识别候选连接,而是利用基于 minimizer 的映射来推断输入序列应该如何排序和定向到支架中。ntLink 的最新改进增加了重要功能,如重叠检测、缺口填充和代码内支架迭代。在这里,我们展示了三个基本协议,演示如何使用这些新功能生成高度连续的基因组组装,同时仍然保持 ntLink 经过验证的计算效率。此外,正如我们在替代协议中所说明的,启用 ntLink 支架构建的轻量级 minimizer 映射也可用于其他下游应用,例如错误组装检测。ntLink 具有模块化和多种执行模式,对基因组学社区具有广泛的益处,从基因组支架构建到更广泛的领域。ntLink 是一个开源项目,可从 https://github.com/bcgsc/ntLink 免费获得。© 2023 作者。当前协议由 Wiley 期刊出版公司出版。基本协议 1:使用重叠检测进行 ntLink 支架构建基本协议 2:使用 ntLink 进行 gap-filling 支架构建基本协议 3:运行 ntLink 支架构建的代码内迭代替代协议 1:使用 ntLink 生成长读至 contig 映射替代协议 2:使用 ntLink 映射进行基因组组装校正与 Tigmint-long 支持协议:安装 ntLink。