polishCLR：用于打磨 PacBio CLR 基因组组装的 Nextflow 工作流程。

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.

机构信息

USDA, Agricultural Research Service, Jamie Whitten Delta States Research Center, Genomics and Bioinformatics Research Unit, Stoneville, Mississippi.

Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee.

出版信息

Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evad020.

DOI:10.1093/gbe/evad020

PMID:36792366

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9985148/

Abstract

Long-read sequencing has revolutionized genome assembly, yielding highly contiguous, chromosome-level contigs. However, assemblies from some third generation long read technologies, such as Pacific Biosciences (PacBio) continuous long reads (CLR), have a high error rate. Such errors can be corrected with short reads through a process called polishing. Although best practices for polishing non-model de novo genome assemblies were recently described by the Vertebrate Genome Project (VGP) Assembly community, there is a need for a publicly available, reproducible workflow that can be easily implemented and run on a conventional high performance computing environment. Here, we describe polishCLR (https://github.com/isugifNF/polishCLR), a reproducible Nextflow workflow that implements best practices for polishing assemblies made from CLR data. PolishCLR can be initiated from several input options that extend best practices to suboptimal cases. It also provides re-entry points throughout several key processes, including identifying duplicate haplotypes in purge_dups, allowing a break for scaffolding if data are available, and throughout multiple rounds of polishing and evaluation with Arrow and FreeBayes. PolishCLR is containerized and publicly available for the greater assembly community as a tool to complete assemblies from existing, error-prone long-read data.

摘要

长读测序技术彻底改变了基因组组装，生成了高度连续的染色体级别的 contigs。然而，一些第三代长读测序技术（如 Pacific Biosciences (PacBio) 连续长读测序 (CLR)）的组装结果错误率较高。这些错误可以通过称为“polishing”的过程使用短读序列来纠正。尽管最近脊椎动物基因组计划 (VGP) 组装社区描述了针对非模式从头组装的最佳 polish 实践，但仍需要一个可公开获取、可重现的工作流程，以便在常规高性能计算环境中轻松实现和运行。在这里，我们描述了 polishCLR（https://github.com/isugifNF/polishCLR），这是一个可重现的 Nextflow 工作流程，它实现了从 CLR 数据组装的最佳实践。polishCLR 可以从多个输入选项启动，这些选项将最佳实践扩展到了非最优情况。它还在多个关键流程中提供了重新进入点，包括在 purge_dups 中识别重复的单倍型，允许在有数据的情况下暂停支架构建，以及在多个 Arrow 和 FreeBayes 的 polish 和评估循环中。polishCLR 是一个容器化的工具，可供更广泛的组装社区使用，用于完成现有易错长读数据的组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba7b/9985148/ef259a4898bb/evad020f1.jpg

相似文献

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.

Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evad020.

BlockPolish: accurate polishing of long-read assembly via block divide-and-conquer.

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab405.

ARAMIS: From systematic errors of NGS long reads to accurate assemblies.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm.

Bioinformatics. 2020 Jun 1;36(12):3669-3679. doi: 10.1093/bioinformatics/btaa179.

Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms.

BMC Bioinformatics. 2021 Jun 5;22(1):303. doi: 10.1186/s12859-021-04118-3.

Benchmarking multi-platform sequencing technologies for human genome assembly.

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad300.

Comparison of long-read methods for sequencing and assembly of a plant genome.

Gigascience. 2020 Dec 21;9(12). doi: 10.1093/gigascience/giaa146.

NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model.

BMC Bioinformatics. 2018 May 22;19(1):177. doi: 10.1186/s12859-018-2208-0.

NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads.

Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzad009.

LongStitch: high-quality genome assembly correction and scaffolding using long reads.

BMC Bioinformatics. 2021 Oct 30;22(1):534. doi: 10.1186/s12859-021-04451-7.

引用本文的文献

A near-complete assembly of the Houttuynia cordata genome provides insights into the regulatory mechanism of flavonoid biosynthesis in Yuxingcao.

Plant Commun. 2024 Oct 14;5(10):101075. doi: 10.1016/j.xplc.2024.101075. Epub 2024 Sep 2.

otb: an automated HiC/HiFi pipeline assembles the Prosapia bicincta Genome.

G3 (Bethesda). 2024 Aug 7;14(8). doi: 10.1093/g3journal/jkae129.

Chromosome-scale genome assembly of the pink bollworm, Pectinophora gossypiella, a global pest of cotton.

G3 (Bethesda). 2023 Apr 11;13(4). doi: 10.1093/g3journal/jkad040.

A Chromosome-Scale Genome Assembly of a Helicoverpa zea Strain Resistant to Bacillus thuringiensis Cry1Ac Insecticidal Protein.

Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evac131.

本文引用的文献

YaHS: yet another Hi-C scaffolding tool.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac808.

A Chromosome-Scale Genome Assembly of a Helicoverpa zea Strain Resistant to Bacillus thuringiensis Cry1Ac Insecticidal Protein.

Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evac131.

Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation.

Nat Methods. 2022 Jun;19(6):696-704. doi: 10.1038/s41592-022-01445-y. Epub 2022 Mar 31.

Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies.

Nat Methods. 2022 Jun;19(6):687-695. doi: 10.1038/s41592-022-01440-3. Epub 2022 Mar 31.

Nanopore sequencing technology, bioinformatics and applications.

Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.

Nat Methods. 2021 Nov;18(11):1322-1332. doi: 10.1038/s41592-021-01299-w. Epub 2021 Nov 1.

Comparative evaluation of Nanopore polishing tools for microbial genome assembly and polishing strategies for downstream analysis.

Sci Rep. 2021 Oct 20;11(1):20740. doi: 10.1038/s41598-021-00178-w.

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers.

Nat Methods. 2021 Oct;18(10):1161-1168. doi: 10.1038/s41592-021-01254-9. Epub 2021 Sep 23.

nf-LO: A Scalable, Containerized Workflow for Genome-to-Genome Lift Over.

Genome Biol Evol. 2021 Sep 1;13(9). doi: 10.1093/gbe/evab183.

The USDA-ARS Ag100Pest Initiative: High-Quality Genome Assemblies for Agricultural Pest Arthropod Research.

Insects. 2021 Jul 9;12(7):626. doi: 10.3390/insects12070626.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

polishCLR：用于打磨 PacBio CLR 基因组组装的 Nextflow 工作流程。

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.

机构信息

USDA, Agricultural Research Service, Jamie Whitten Delta States Research Center, Genomics and Bioinformatics Research Unit, Stoneville, Mississippi.

Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee.

出版信息

Genome Biol Evol. 2023 Mar 3;15(3). doi: 10.1093/gbe/evad020.

DOI:10.1093/gbe/evad020

PMID:36792366

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9985148/

Abstract

摘要

polishCLR：用于打磨 PacBio CLR 基因组组装的 Nextflow 工作流程。

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

polishCLR：用于打磨 PacBio CLR 基因组组装的 Nextflow 工作流程。

polishCLR: A Nextflow Workflow for Polishing PacBio CLR Genome Assemblies.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献