ntEdit+Sealer：高效靶向纠错与长读长基因组组装自动化封端。

ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long-Read Genome Assemblies.

机构信息

Canada's Michael Smith Genome Sciences Center, Vancouver, BC, Canada.

Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.

出版信息

Curr Protoc. 2022 May;2(5):e442. doi: 10.1002/cpz1.442.

DOI:10.1002/cpz1.442

PMID:35567771

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9196995/

Abstract

High-quality genome assemblies are crucial to many biological studies, and utilizing long sequencing reads can help achieve higher assembly contiguity. While long reads can resolve complex and repetitive regions of a genome, their relatively high associated error rates are still a major limitation. Long reads generally produce draft genome assemblies with lower base quality, which must be corrected with a genome polishing step. Hybrid genome polishing solutions can greatly improve the quality of long-read genome assemblies by utilizing more accurate short reads to validate bases and correct errors. Currently available hybrid polishing methods rely on read alignments, and are therefore memory-intensive and do not scale well to large genomes. Here we describe ntEdit+Sealer, an alignment-free, k-mer-based genome finishing protocol that employs memory-efficient Bloom filters. The protocol includes ntEdit for correcting base errors and small indels, and for marking potentially problematic regions, then Sealer for filling both assembly gaps and problematic regions flagged by ntEdit. ntEdit+Sealer produces highly accurate, error-corrected genome assemblies, and is available as a Makefile pipeline from https://github.com/bcgsc/ntedit_sealer_protocol. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Automated long-read genome finishing with short reads Support Protocol: Selecting optimal values for k-mer lengths (k) and Bloom filter size (b).

摘要

高质量的基因组组装对于许多生物学研究至关重要，而利用长测序读长可以帮助提高组装连续性。虽然长读长可以解决基因组中复杂和重复的区域，但它们相对较高的相关错误率仍然是一个主要限制。长读长通常生成具有较低碱基质量的草图基因组组装，必须通过基因组抛光步骤进行校正。混合基因组抛光解决方案可以通过利用更准确的短读长来验证碱基并纠正错误，从而大大提高长读长基因组组装的质量。目前可用的混合抛光方法依赖于读长比对，因此内存密集，并且不能很好地扩展到大型基因组。在这里，我们描述了 ntEdit+Sealer，这是一种无比对、基于 k-mer 的基因组完成协议，它采用了内存高效的布隆过滤器。该协议包括用于校正碱基错误和小插入缺失、标记潜在问题区域的 ntEdit，以及用于填充组装缺口和 ntEdit 标记的问题区域的 Sealer。ntEdit+Sealer 生成高度准确、纠错的基因组组装，可从 https://github.com/bcgsc/ntedit_sealer_protocol 获得 Makefile 管道。© 2022 作者。Wiley Periodicals LLC 出版的《当代协议》。基本方案：使用短读长自动进行长读长基因组完成支持方案：选择最佳的 k-mer 长度（k）和布隆过滤器大小（b）值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a712/9543155/f19cc3f21f16/CPZ1-2-0-g001.jpg

相似文献

ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long-Read Genome Assemblies.ntEdit+Sealer：高效靶向纠错与长读长基因组组装自动化封端。

Curr Protoc. 2022 May;2(5):e442. doi: 10.1002/cpz1.442.

ntEdit: scalable genome sequence polishing.ntEdit：可扩展的基因组序列优化。

Bioinformatics. 2019 Nov 1;35(21):4430-4432. doi: 10.1093/bioinformatics/btz400.

NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads.NextPolish2：一种针对使用 HiFi 长读长组装的基因组进行重复感知优化的工具。

Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzad009.

ntLink: A Toolkit for De Novo Genome Assembly Scaffolding and Mapping Using Long Reads.ntLink：一种使用长读长进行从头基因组组装支架和映射的工具包。

Curr Protoc. 2023 Apr;3(4):e733. doi: 10.1002/cpz1.733.

Sealer: a scalable gap-closing application for finishing draft genomes.Sealer：一种用于完成草图基因组的可扩展缺口闭合应用程序。

BMC Bioinformatics. 2015 Jul 25;16(1):230. doi: 10.1186/s12859-015-0663-4.

LongStitch: high-quality genome assembly correction and scaffolding using long reads.LongStitch：使用长读长进行高质量基因组组装纠错和 scaffolding。

BMC Bioinformatics. 2021 Oct 30;22(1):534. doi: 10.1186/s12859-021-04451-7.

Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.用 Illumina 短读序列对牛津纳米孔长读序列组装的细菌病原体进行打磨，以改进基因组分析。

Genomics. 2021 May;113(3):1366-1377. doi: 10.1016/j.ygeno.2021.03.018. Epub 2021 Mar 11.

BlockPolish: accurate polishing of long-read assembly via block divide-and-conquer.BlockPolish：通过块划分与征服实现长读序列组装的精确抛光。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab405.

Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case.利用长读长和短读数据组装叶绿体基因组：以白千层作为测试案例的方法比较。

BMC Genomics. 2018 Dec 29;19(1):977. doi: 10.1186/s12864-018-5348-8.

DENTIST-using long reads for closing assembly gaps at high accuracy.牙医——使用长读段以高精度封闭组装间隙。

Gigascience. 2022 Jan 25;11. doi: 10.1093/gigascience/giab100.

引用本文的文献

Nuclear genome assembly of Leucinodes orbonalis (Lepidoptera: Crambidae) collected from the Philippines.从菲律宾采集的棉铃虫（鳞翅目：草螟科）的核基因组组装

J Insect Sci. 2025 May 9;25(3). doi: 10.1093/jisesa/ieaf066.

GoldPolish-target: targeted long-read genome assembly polishing.GoldPolish目标：靶向长读长基因组组装优化

BMC Bioinformatics. 2025 Mar 7;26(1):78. doi: 10.1186/s12859-025-06091-7.

aaHash: recursive amino acid sequence hashing.氨基酸哈希值：递归氨基酸序列哈希法。

Bioinform Adv. 2023 Nov 11;3(1):vbad162. doi: 10.1093/bioadv/vbad162. eCollection 2023.

Linear time complexity de novo long read genome assembly with GoldRush.使用 GoldRush 进行具有线性时间复杂度的从头长读基因组组装。

Nat Commun. 2023 May 22;14(1):2906. doi: 10.1038/s41467-023-38716-x.

ntLink: A Toolkit for De Novo Genome Assembly Scaffolding and Mapping Using Long Reads.ntLink：一种使用长读长进行从头基因组组装支架和映射的工具包。

Curr Protoc. 2023 Apr;3(4):e733. doi: 10.1002/cpz1.733.

本文引用的文献

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.纳米孔测序和 Shasta 工具包可实现 11 个人类基因组的高效从头组装。

Nat Biotechnol. 2020 Sep;38(9):1044-1053. doi: 10.1038/s41587-020-0503-6. Epub 2020 May 4.

Long-read human genome sequencing and its applications.长读长基因组测序及其应用。

Nat Rev Genet. 2020 Oct;21(10):597-614. doi: 10.1038/s41576-020-0236-x. Epub 2020 Jun 5.

Opportunities and challenges in long-read sequencing data analysis.长读测序数据分析中的机遇与挑战。

Genome Biol. 2020 Feb 7;21(1):30. doi: 10.1186/s13059-020-1935-5.

ntEdit: scalable genome sequence polishing.ntEdit：可扩展的基因组序列优化。

Bioinformatics. 2019 Nov 1;35(21):4430-4432. doi: 10.1093/bioinformatics/btz400.

The Third Revolution in Sequencing Technology.测序技术的第三次革命。

Trends Genet. 2018 Sep;34(9):666-681. doi: 10.1016/j.tig.2018.05.008. Epub 2018 Jun 22.

ntCard: a streaming algorithm for cardinality estimation in genomics data.ntCard：一种用于基因组数据基数估计的流算法。

Bioinformatics. 2017 May 1;33(9):1324-1330. doi: 10.1093/bioinformatics/btw832.

ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter.ABySS 2.0：使用布隆过滤器对大型基因组进行资源高效组装。

Genome Res. 2017 May;27(5):768-777. doi: 10.1101/gr.214346.116. Epub 2017 Feb 23.

Fast and accurate de novo genome assembly from long uncorrected reads.从长的未校正读段中进行快速且准确的从头基因组组装。

Genome Res. 2017 May;27(5):737-746. doi: 10.1101/gr.214270.116. Epub 2017 Jan 18.

Sealer: a scalable gap-closing application for finishing draft genomes.Sealer：一种用于完成草图基因组的可扩展缺口闭合应用程序。

BMC Bioinformatics. 2015 Jul 25;16(1):230. doi: 10.1186/s12859-015-0663-4.

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.BUSCO：利用单拷贝同源基因评估基因组组装和注释的完整性。

Bioinformatics. 2015 Oct 1;31(19):3210-2. doi: 10.1093/bioinformatics/btv351. Epub 2015 Jun 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ntEdit+Sealer：高效靶向纠错与长读长基因组组装自动化封端。

ntEdit+Sealer: Efficient Targeted Error Resolution and Automated Finishing of Long-Read Genome Assemblies.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献