利用裸臀南极鱼（Trematomus borchgrevinki）评估 Illumina、Nanopore 和 PacBio 三种基因组组装策略。

Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki.

机构信息

Department of Evolution, Ecology, and Behavior, University of Illinois, Urbana-Champaign, Champaign, IL 61801, USA.

出版信息

G3 (Bethesda). 2022 Nov 4;12(11). doi: 10.1093/g3journal/jkac192.

DOI:10.1093/g3journal/jkac192

PMID:35904764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9635638/

Abstract

For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least 3 phases: (1) short-read only, (2) short- and long-read hybrid, and (3) long-read only assemblies. Each of the phases has its own error model. We hypothesized that hidden short-read scaffolding errors and erroneous long-read contigs degrade the quality of short- and long-read hybrid assemblies. We assembled the genome of Trematomus borchgrevinki from data generated during each of the 3 phases and assessed the quality problems we encountered. We developed strategies such as k-mer-assembled region replacement, parameter optimization, and long-read sampling to address the error models. We demonstrated that a k-mer-based strategy improved short-read assemblies as measured by Benchmarking Universal Single-Copy Ortholog while mate-pair libraries introduced hidden scaffolding errors and perturbed Benchmarking Universal Single-Copy Ortholog scores. Furthermore, we found that although hybrid assemblies can generate higher contiguity they tend to suffer from lower quality. In addition, we found long-read-only assemblies can be optimized for contiguity by subsampling length-restricted raw reads. Our results indicate that long-read contig assembly is the current best choice and that assemblies from phase I and phase II were of lower quality.

摘要

对于任何基于基因组的研究，都需要一个强大的基因组组装。从头组装策略随着 DNA 测序技术的变化而发展，已经经历了至少 3 个阶段：（1）仅短读长，（2）短读长和长读长混合，（3）仅长读长组装。每个阶段都有自己的错误模型。我们假设隐藏的短读长支架错误和错误的长读长 contigs 会降低短读长和长读长混合组装的质量。我们使用在这 3 个阶段中生成的数据组装了 Trematomus borchgrevinki 的基因组，并评估了我们遇到的质量问题。我们开发了一些策略，如 k-mer 组装区域替换、参数优化和长读长采样，以解决这些错误模型。我们证明了基于 k-mer 的策略可以通过测量 Benchmarking Universal Single-Copy Ortholog 来提高短读长组装的质量，而 mate-pair 文库会引入隐藏的支架错误并干扰 Benchmarking Universal Single-Copy Ortholog 的分数。此外，我们发现尽管混合组装可以产生更高的连续性，但它们往往质量较低。此外，我们发现仅长读长组装可以通过对长度受限的原始读数进行抽样来优化连续性。我们的结果表明，长读长 contig 组装是目前的最佳选择，而第 I 阶段和第 II 阶段的组装质量较低。

相似文献

Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki.利用裸臀南极鱼（Trematomus borchgrevinki）评估 Illumina、Nanopore 和 PacBio 三种基因组组装策略。

G3 (Bethesda). 2022 Nov 4;12(11). doi: 10.1093/g3journal/jkac192.

Improved assembly of noisy long reads by k-mer validation.通过k-mer验证改进嘈杂长读段的组装。

Genome Res. 2016 Dec;26(12):1710-1720. doi: 10.1101/gr.209247.116. Epub 2016 Oct 7.

Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case.利用长读长和短读数据组装叶绿体基因组：以白千层作为测试案例的方法比较。

BMC Genomics. 2018 Dec 29;19(1):977. doi: 10.1186/s12864-018-5348-8.

The long and short of it: benchmarking viromics using Illumina, Nanopore and PacBio sequencing technologies.简而言之：使用Illumina、Nanopore和PacBio测序技术对病毒组进行基准测试。

Microb Genom. 2024 Feb;10(2). doi: 10.1099/mgen.0.001198.

Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具：见解与考虑。

Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.

Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.用 Illumina 短读序列对牛津纳米孔长读序列组装的细菌病原体进行打磨，以改进基因组分析。

Genomics. 2021 May;113(3):1366-1377. doi: 10.1016/j.ygeno.2021.03.018. Epub 2021 Mar 11.

Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools.通过合成基因组文库的长读长测序完成细菌基因组草图

BMC Genomics. 2020 Jul 29;21(1):519. doi: 10.1186/s12864-020-06910-6.

Evaluating the accuracy of Listeria monocytogenes assemblies from quasimetagenomic samples using long and short reads.评估使用长读长和短读长的准宏基因组样本中单核细胞增生李斯特菌组装的准确性。

BMC Genomics. 2021 May 26;22(1):389. doi: 10.1186/s12864-021-07702-2.

Can we use it? On the utility of de novo and reference-based assembly of Nanopore data for plant plastome sequencing.我们可以使用它吗？从头组装和基于参考的纳米孔数据在植物质体测序中的应用。

PLoS One. 2020 Mar 24;15(3):e0226234. doi: 10.1371/journal.pone.0226234. eCollection 2020.

Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing.利用 MinION 长读测序技术评估组装多种细菌基因组的策略。

BMC Genomics. 2019 Jan 9;20(1):23. doi: 10.1186/s12864-018-5381-7.

引用本文的文献

Streamlining whole genome sequencing for clinical diagnostics with ONT technology.利用ONT技术简化用于临床诊断的全基因组测序

Sci Rep. 2025 Feb 20;15(1):6270. doi: 10.1038/s41598-025-90127-8.

Nanopore sequencing reveals that DNA replication compartmentalisation dictates genome stability and instability in Trypanosoma brucei.纳米孔测序显示，DNA复制的区室化决定了布氏锥虫基因组的稳定性和不稳定性。

Nat Commun. 2025 Jan 16;16(1):751. doi: 10.1038/s41467-025-56087-3.

The genome of the cryopelagic Antarctic bald notothen, Trematomus borchgrevinki.南极冰下裸南极鱼（Trematomus borchgrevinki）的基因组。

G3 (Bethesda). 2025 Jan 8;15(1). doi: 10.1093/g3journal/jkae267.

Deciphering immunoglobulin loci in multiple genome assemblies and enrichment of IMGT resources.解析多个基因组组装中的免疫球蛋白基因座和富集的 IMGT 资源。

Front Immunol. 2024 Oct 10;15:1475003. doi: 10.3389/fimmu.2024.1475003. eCollection 2024.

Diverse Origins of Near-Identical Antifreeze Proteins in Unrelated Fish Lineages Provide Insights Into Evolutionary Mechanisms of New Gene Birth and Protein Sequence Convergence.亲缘关系较远的鱼类谱系中近乎相同的抗冻蛋白的多样起源，为新基因诞生和蛋白质序列趋同的进化机制提供了见解。

Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae182.

Klumpy: A tool to evaluate the integrity of long-read genome assemblies and illusive sequence motifs.Klumpy：一种评估长读长基因组组装完整性和难以捉摸的序列基序的工具。

Mol Ecol Resour. 2025 Jan;25(1):e13982. doi: 10.1111/1755-0998.13982. Epub 2024 May 27.

Full-length 16S rRNA gene sequencing by PacBio improves taxonomic resolution in human microbiome samples.三代全长 16S rRNA 基因测序提高了人类微生物组样本的分类分辨率。

BMC Genomics. 2024 Mar 25;25(1):310. doi: 10.1186/s12864-024-10213-5.

Long-read, chromosome-scale assembly of Vitis rotundifolia cv. Carlos and its unique resistance to Xylella fastidiosa subsp. fastidiosa.葡萄属圆叶葡萄 cv. 卡洛斯的长读、染色体级别的组装及其对韧皮部难养菌亚种的独特抗性。

BMC Genomics. 2023 Jul 20;24(1):409. doi: 10.1186/s12864-023-09514-y.

Chromosome-Level Genome Assembly and Circadian Gene Repertoire of the Patagonia Blennie -The Closest Ancestral Proxy of Antarctic Cryonotothenioids.巴塔哥尼亚鳚的染色体水平基因组组装和昼夜节律基因组成——南极 Cryonotothenioids 的最接近的祖先代表。

Genes (Basel). 2023 May 30;14(6):1196. doi: 10.3390/genes14061196.

Novel mitochondrial genome rearrangements including duplications and extensive heteroplasmy could underlie temperature adaptations in Antarctic notothenioid fishes.新型线粒体基因组重排，包括重复和广泛异质性，可能是南极鱼类适应温度的基础。

Sci Rep. 2023 Apr 28;13(1):6939. doi: 10.1038/s41598-023-34237-1.

本文引用的文献

Towards complete and error-free genome assemblies of all vertebrate species.致力于完成所有脊椎动物物种的完整且无错误的基因组组装。

Nature. 2021 Apr;592(7856):737-746. doi: 10.1038/s41586-021-03451-0. Epub 2021 Apr 28.

Genome Assembly of the Canadian two-row Malting Barley cultivar AAC Synergy.加拿大二棱型酿造大麦品种 AAC 协同的基因组组装。

G3 (Bethesda). 2021 Apr 15;11(4). doi: 10.1093/g3journal/jkab031.

Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes.比较长读测序技术在细菌和果蝇基因组分析中的应用。

G3 (Bethesda). 2021 Jun 17;11(6). doi: 10.1093/g3journal/jkab083.

A high-quality genome assembly highlights rye genomic characteristics and agronomically important genes.高质量的基因组组装突出了黑麦的基因组特征和农艺上重要的基因。

Nat Genet. 2021 Apr;53(4):574-584. doi: 10.1038/s41588-021-00808-z. Epub 2021 Mar 18.

Efficient assembly of nanopore reads via highly accurate and intact error correction.通过高度准确和完整的纠错实现纳米孔读取的高效组装。

Nat Commun. 2021 Jan 4;12(1):60. doi: 10.1038/s41467-020-20236-7.

A comprehensive evaluation of long read error correction methods.长读错误纠正方法的综合评价。

BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0.

Comparison of long-read methods for sequencing and assembly of a plant genome.长读测序和组装植物基因组方法的比较。

Gigascience. 2020 Dec 21;9(12). doi: 10.1093/gigascience/giaa146.

Draft Genome of the Common Snapping Turtle, , a Model for Phenotypic Plasticity in Reptiles.拟穴青龟基因组草图，龟鳖目动物表型可塑性的模式种。

G3 (Bethesda). 2020 Dec 3;10(12):4299-4314. doi: 10.1534/g3.120.401440.

Optical map guided genome assembly.光学图谱指导的基因组组装。

BMC Bioinformatics. 2020 Jul 6;21(1):285. doi: 10.1186/s12859-020-03623-1.

Long-read human genome sequencing and its applications.长读长基因组测序及其应用。

Nat Rev Genet. 2020 Oct;21(10):597-614. doi: 10.1038/s41576-020-0236-x. Epub 2020 Jun 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验