Suppr超能文献

酵母基因组长读测序、组装和精修的基准测试。

Benchmarking of long-read sequencing, assemblers and polishers for yeast genome.

机构信息

State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Science of the Ministry of Education, Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders of the Ministry of Education, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.

State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China.

出版信息

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac146.

Abstract

BACKGROUND

The long reads of the third-generation sequencing significantly benefit the quality of the de novo genome assembly. However, its relatively high single-base error rate has been criticized. Currently, sequencing accuracy and throughput continue to improve, and many advanced tools are constantly emerging. PacBio HiFi sequencing and Oxford Nanopore Technologies (ONT) PromethION are two up-to-date platforms with low error rates and ultralong high-throughput reads. Therefore, it is urgently needed to select the appropriate sequencing platforms, depths and genome assembly tools for high-quality genomes in the era of explosive data production.

METHODS

We performed 455 (7 assemblers with 4 polishing pipelines or without polishing on 13 subsets with different depths) and 88 (4 assemblers with or without polishing on 11 subsets with different depths) de novo assemblies of Yeast S288C on high-coverage ONT and HiFi datasets, respectively. The assembly quality was evaluated by Quality Assessment Tool (QUAST), Benchmarking Universal Single-Copy Orthologs (BUSCO) and the newly proposed Comprehensive_score (C_score). In addition, we applied four preferable pipelines to assemble the genome of nonreference yeast strains.

RESULTS

The assembler plays an essential role in genome construction, especially for low-depth datasets. For ONT datasets, Flye is superior to other tools through C_score evaluation. Polishing by Pilon and Medaka improve accuracy and continuity of the preassemblies, respectively, and their combination pipeline worked well in most quality metrics. For HiFi datasets, Flye and NextDenovo performed better than other tools, and polishing is also necessary. Enough data depth is required for high-quality genome construction by ONT (>80X) and HiFi (>20X) datasets.

摘要

背景

第三代测序的长读长显著提高了从头基因组组装的质量。然而,其相对较高的单碱基错误率受到了批评。目前,测序准确性和通量不断提高,许多先进的工具也在不断涌现。PacBio HiFi 测序和 Oxford Nanopore Technologies(ONT)PromethION 是两个具有低错误率和超长高通量读长的最新平台。因此,在数据爆炸式增长的时代,迫切需要为高质量基因组选择合适的测序平台、深度和基因组组装工具。

方法

我们分别在 ONT 高覆盖数据集和 HiFi 数据集上对酵母 S288C 进行了 455 次(7 个组装器,4 个抛光管道或不抛光在 13 个不同深度的子集上)和 88 次(4 个组装器,在 11 个不同深度的子集上进行抛光或不抛光)从头组装。通过质量评估工具(QUAST)、基准通用单拷贝同源物(BUSCO)和新提出的综合得分(C 得分)评估组装质量。此外,我们应用了四个较好的组装流程来组装非参考酵母菌株的基因组。

结果

组装器在基因组构建中起着至关重要的作用,特别是对于低深度数据集。对于 ONT 数据集,通过 C 得分评估,Flye 优于其他工具。Pilon 和 Medaka 的抛光分别提高了预组装的准确性和连续性,它们的组合管道在大多数质量指标上表现良好。对于 HiFi 数据集,Flye 和 NextDenovo 的性能优于其他工具,并且抛光也是必要的。ONT(>80X)和 HiFi(>20X)数据集需要足够的数据深度才能构建高质量的基因组。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验