Suppr超能文献

ntLink:一种使用长读长进行从头基因组组装支架和映射的工具包。

ntLink: A Toolkit for De Novo Genome Assembly Scaffolding and Mapping Using Long Reads.

机构信息

Canada's Michael Smith Genome Sciences Centre at BC Cancer, Vancouver, British Columbia.

出版信息

Curr Protoc. 2023 Apr;3(4):e733. doi: 10.1002/cpz1.733.

Abstract

With the increasing affordability and accessibility of genome sequencing data, de novo genome assembly is an important first step to a wide variety of downstream studies and analyses. Therefore, bioinformatics tools that enable the generation of high-quality genome assemblies in a computationally efficient manner are essential. Recent developments in long-read sequencing technologies have greatly benefited genome assembly work, including scaffolding, by providing long-range evidence that can aid in resolving the challenging repetitive regions of complex genomes. ntLink is a flexible and resource-efficient genome scaffolding tool that utilizes long-read sequencing data to improve upon draft genome assemblies built from any sequencing technologies, including the same long reads. Instead of using read alignments to identify candidate joins, ntLink utilizes minimizer-based mappings to infer how input sequences should be ordered and oriented into scaffolds. Recent improvements to ntLink have added important features such as overlap detection, gap-filling, and in-code scaffolding iterations. Here, we present three basic protocols demonstrating how to use each of these new features to yield highly contiguous genome assemblies, while still maintaining ntLink's proven computational efficiency. Further, as we illustrate in the alternate protocols, the lightweight minimizer-based mappings that enable ntLink scaffolding can also be utilized for other downstream applications, such as misassembly detection. With its modularity and multiple modes of execution, ntLink has broad benefit to the genomics community, from genome scaffolding and beyond. ntLink is an open-source project and is freely available from https://github.com/bcgsc/ntLink. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: ntLink scaffolding using overlap detection Basic Protocol 2: ntLink scaffolding with gap-filling Basic Protocol 3: Running in-code iterations of ntLink scaffolding Alternate Protocol 1: Generating long-read to contig mappings with ntLink Alternate Protocol 2: Using ntLink mappings for genome assembly correction with Tigmint-long Support Protocol: Installing ntLink.

摘要

随着基因组测序数据的可负担性和可及性的提高,从头基因组组装是进行各种下游研究和分析的重要第一步。因此,能够以计算高效的方式生成高质量基因组组装的生物信息学工具是必不可少的。长读测序技术的最新发展极大地促进了基因组组装工作,包括支架构建,为解决复杂基因组中具有挑战性的重复区域提供了长程证据。ntLink 是一种灵活且资源高效的基因组支架构建工具,它利用长读测序数据来改进基于任何测序技术(包括相同的长读)构建的草稿基因组组装。ntLink 不是使用读取比对来识别候选连接,而是利用基于 minimizer 的映射来推断输入序列应该如何排序和定向到支架中。ntLink 的最新改进增加了重要功能,如重叠检测、缺口填充和代码内支架迭代。在这里,我们展示了三个基本协议,演示如何使用这些新功能生成高度连续的基因组组装,同时仍然保持 ntLink 经过验证的计算效率。此外,正如我们在替代协议中所说明的,启用 ntLink 支架构建的轻量级 minimizer 映射也可用于其他下游应用,例如错误组装检测。ntLink 具有模块化和多种执行模式,对基因组学社区具有广泛的益处,从基因组支架构建到更广泛的领域。ntLink 是一个开源项目,可从 https://github.com/bcgsc/ntLink 免费获得。© 2023 作者。当前协议由 Wiley 期刊出版公司出版。基本协议 1:使用重叠检测进行 ntLink 支架构建基本协议 2:使用 ntLink 进行 gap-filling 支架构建基本协议 3:运行 ntLink 支架构建的代码内迭代替代协议 1:使用 ntLink 生成长读至 contig 映射替代协议 2:使用 ntLink 映射进行基因组组装校正与 Tigmint-long 支持协议:安装 ntLink。

相似文献

2
LongStitch: high-quality genome assembly correction and scaffolding using long reads.
BMC Bioinformatics. 2021 Oct 30;22(1):534. doi: 10.1186/s12859-021-04451-7.
3
ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers.
BMC Bioinformatics. 2018 Jun 20;19(1):234. doi: 10.1186/s12859-018-2243-x.
4
Maptcha: an efficient parallel workflow for hybrid genome scaffolding.
BMC Bioinformatics. 2024 Aug 8;25(1):263. doi: 10.1186/s12859-024-05878-4.
5
ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs.
Bioinformatics. 2020 Jun 1;36(12):3885-3887. doi: 10.1093/bioinformatics/btaa253.
6
ARCS: scaffolding genome drafts with linked reads.
Bioinformatics. 2018 Mar 1;34(5):725-731. doi: 10.1093/bioinformatics/btx675.
9
Highly accurate long reads are crucial for realizing the potential of biodiversity genomics.
BMC Genomics. 2023 Mar 16;24(1):117. doi: 10.1186/s12864-023-09193-9.
10
Tigmint: correcting assembly errors using linked reads from large molecules.
BMC Bioinformatics. 2018 Oct 26;19(1):393. doi: 10.1186/s12859-018-2425-6.

引用本文的文献

3
Genome assembly and annotation of from Mo'orea French Polynesia.
GigaByte. 2025 Apr 10;2025:gigabyte153. doi: 10.46471/gigabyte.153. eCollection 2025.
4
GoldPolish-target: targeted long-read genome assembly polishing.
BMC Bioinformatics. 2025 Mar 7;26(1):78. doi: 10.1186/s12859-025-06091-7.
5
Post-embryonic tail development through molting of the freshwater shrimp .
iScience. 2025 Jan 23;28(2):111885. doi: 10.1016/j.isci.2025.111885. eCollection 2025 Feb 21.
7
Long-read de novo genome assembly of Gulf toadfish (Opsanus beta).
BMC Genomics. 2024 Sep 18;25(1):871. doi: 10.1186/s12864-024-10747-8.
9
Maptcha: an efficient parallel workflow for hybrid genome scaffolding.
BMC Bioinformatics. 2024 Aug 8;25(1):263. doi: 10.1186/s12859-024-05878-4.

本文引用的文献

1
Linear time complexity de novo long read genome assembly with GoldRush.
Nat Commun. 2023 May 22;14(1):2906. doi: 10.1038/s41467-023-38716-x.
2
ntHash2: recursive spaced seed hashing for nucleotide sequences.
Bioinformatics. 2022 Oct 14;38(20):4812-4813. doi: 10.1093/bioinformatics/btac564.
4
BUSCO: Assessing Genomic Data Quality and Beyond.
Curr Protoc. 2021 Dec;1(12):e323. doi: 10.1002/cpz1.323.
5
LongStitch: high-quality genome assembly correction and scaffolding using long reads.
BMC Bioinformatics. 2021 Oct 30;22(1):534. doi: 10.1186/s12859-021-04451-7.
7
Long-read human genome sequencing and its applications.
Nat Rev Genet. 2020 Oct;21(10):597-614. doi: 10.1038/s41576-020-0236-x. Epub 2020 Jun 5.
8
LRScaf: improving draft genomes using long noisy reads.
BMC Genomics. 2019 Dec 9;20(1):955. doi: 10.1186/s12864-019-6337-2.
9
ntEdit: scalable genome sequence polishing.
Bioinformatics. 2019 Nov 1;35(21):4430-4432. doi: 10.1093/bioinformatics/btz400.
10
Assembly of long, error-prone reads using repeat graphs.
Nat Biotechnol. 2019 May;37(5):540-546. doi: 10.1038/s41587-019-0072-8. Epub 2019 Apr 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验