Suppr超能文献

利用裸臀南极鱼(Trematomus borchgrevinki)评估 Illumina、Nanopore 和 PacBio 三种基因组组装策略。

Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki.

机构信息

Department of Evolution, Ecology, and Behavior, University of Illinois, Urbana-Champaign, Champaign, IL 61801, USA.

出版信息

G3 (Bethesda). 2022 Nov 4;12(11). doi: 10.1093/g3journal/jkac192.

Abstract

For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least 3 phases: (1) short-read only, (2) short- and long-read hybrid, and (3) long-read only assemblies. Each of the phases has its own error model. We hypothesized that hidden short-read scaffolding errors and erroneous long-read contigs degrade the quality of short- and long-read hybrid assemblies. We assembled the genome of Trematomus borchgrevinki from data generated during each of the 3 phases and assessed the quality problems we encountered. We developed strategies such as k-mer-assembled region replacement, parameter optimization, and long-read sampling to address the error models. We demonstrated that a k-mer-based strategy improved short-read assemblies as measured by Benchmarking Universal Single-Copy Ortholog while mate-pair libraries introduced hidden scaffolding errors and perturbed Benchmarking Universal Single-Copy Ortholog scores. Furthermore, we found that although hybrid assemblies can generate higher contiguity they tend to suffer from lower quality. In addition, we found long-read-only assemblies can be optimized for contiguity by subsampling length-restricted raw reads. Our results indicate that long-read contig assembly is the current best choice and that assemblies from phase I and phase II were of lower quality.

摘要

对于任何基于基因组的研究,都需要一个强大的基因组组装。从头组装策略随着 DNA 测序技术的变化而发展,已经经历了至少 3 个阶段:(1)仅短读长,(2)短读长和长读长混合,(3)仅长读长组装。每个阶段都有自己的错误模型。我们假设隐藏的短读长支架错误和错误的长读长 contigs 会降低短读长和长读长混合组装的质量。我们使用在这 3 个阶段中生成的数据组装了 Trematomus borchgrevinki 的基因组,并评估了我们遇到的质量问题。我们开发了一些策略,如 k-mer 组装区域替换、参数优化和长读长采样,以解决这些错误模型。我们证明了基于 k-mer 的策略可以通过测量 Benchmarking Universal Single-Copy Ortholog 来提高短读长组装的质量,而 mate-pair 文库会引入隐藏的支架错误并干扰 Benchmarking Universal Single-Copy Ortholog 的分数。此外,我们发现尽管混合组装可以产生更高的连续性,但它们往往质量较低。此外,我们发现仅长读长组装可以通过对长度受限的原始读数进行抽样来优化连续性。我们的结果表明,长读长 contig 组装是目前的最佳选择,而第 I 阶段和第 II 阶段的组装质量较低。

相似文献

2
Improved assembly of noisy long reads by k-mer validation.通过k-mer验证改进嘈杂长读段的组装。
Genome Res. 2016 Dec;26(12):1710-1720. doi: 10.1101/gr.209247.116. Epub 2016 Oct 7.

引用本文的文献

本文引用的文献

6
A comprehensive evaluation of long read error correction methods.长读错误纠正方法的综合评价。
BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0.
9
Optical map guided genome assembly.光学图谱指导的基因组组装。
BMC Bioinformatics. 2020 Jul 6;21(1):285. doi: 10.1186/s12859-020-03623-1.
10
Long-read human genome sequencing and its applications.长读长基因组测序及其应用。
Nat Rev Genet. 2020 Oct;21(10):597-614. doi: 10.1038/s41576-020-0236-x. Epub 2020 Jun 5.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验