GenomeScope 2.0 和 Smudgeplot 用于无参考的多倍体基因组剖析。

GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.

机构信息

Johns Hopkins University, Baltimore, MD, USA.

University of Lausanne, Lausanne, CH, Switzerland.

出版信息

Nat Commun. 2020 Mar 18;11(1):1432. doi: 10.1038/s41467-020-14998-3.

DOI:10.1038/s41467-020-14998-3

PMID:32188846

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7080791/

Abstract

An important assessment prior to genome assembly and related analyses is genome profiling, where the k-mer frequencies within raw sequencing reads are analyzed to estimate major genome characteristics such as size, heterozygosity, and repetitiveness. Here we introduce GenomeScope 2.0 (https://github.com/tbenavi1/genomescope2.0), which applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. We describe and evaluate a practical implementation of the polyploid-aware mixture model that quickly and accurately infers genome properties across thousands of simulated and several real datasets spanning a broad range of complexity. We also present a method called Smudgeplot (https://github.com/KamilSJaron/smudgeplot) to visualize and estimate the ploidy and genome structure of a genome by analyzing heterozygous k-mer pairs. We successfully apply the approach to systems of known variable ploidy levels in the Meloidogyne genus and the extreme case of octoploid Fragaria × ananassa.

摘要

在进行基因组组装和相关分析之前，一个重要的评估是基因组分析，其中对原始测序reads 中的 k-mer 频率进行分析，以估计主要的基因组特征，如大小、杂合性和重复性。在这里，我们介绍 GenomeScope 2.0（https://github.com/tbenavi1/genomescope2.0），它应用组合理论建立了一个详细的数学模型，说明 k-mer 频率在杂合和多倍体基因组中的分布情况。我们描述并评估了一种实用的多倍体混合模型实现，该模型可以快速准确地推断数千个模拟数据集和几个真实数据集的基因组特性，这些数据集涵盖了广泛的复杂性范围。我们还提出了一种名为 Smudgeplot（https://github.com/KamilSJaron/smudgeplot）的方法，通过分析杂合 k-mer 对来可视化和估计基因组的倍性和基因组结构。我们成功地将该方法应用于已知具有可变倍性水平的 Meloidogyne 属系统和极端的八倍体 Fragaria ×ananassa 中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/94b1/7080791/eccbbe6859c1/41467_2020_14998_Fig1_HTML.jpg

相似文献

GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.GenomeScope 2.0 和 Smudgeplot 用于无参考的多倍体基因组剖析。

Nat Commun. 2020 Mar 18;11(1):1432. doi: 10.1038/s41467-020-14998-3.

Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.通过对草莓属物种基因组进行深度测序解析八倍体草莓基因组

DNA Res. 2014;21(2):169-81. doi: 10.1093/dnares/dst049. Epub 2013 Nov 26.

GenomeScope: fast reference-free genome profiling from short reads.GenomeScope：从短读长实现快速无参基因组剖析。

Bioinformatics. 2017 Jul 15;33(14):2202-2204. doi: 10.1093/bioinformatics/btx153.

A New Perspective on Polyploid Fragaria (Strawberry) Genome Composition Based on Large-Scale, Multi-Locus Phylogenetic Analysis.基于大规模、多基因座系统发育分析的多倍体 Fragaria（草莓）基因组组成的新视角。

Genome Biol Evol. 2017 Dec 1;9(12):3433-3448. doi: 10.1093/gbe/evx214.

nPhase: an accurate and contiguous phasing method for polyploids.nPhase：一种用于多倍体的准确连续相位方法。

Genome Biol. 2021 Apr 29;22(1):126. doi: 10.1186/s13059-021-02342-x.

Plastid genomes reveal recurrent formation of allopolyploid Fragaria.质体基因组揭示了异源多倍体草莓的频繁形成。

Am J Bot. 2018 May;105(5):862-874. doi: 10.1002/ajb2.1085. Epub 2018 May 24.

Evolutionary origins and dynamics of octoploid strawberry subgenomes revealed by dense targeted capture linkage maps.通过密集靶向捕获连锁图谱揭示八倍体草莓亚基因组的进化起源与动态变化

Genome Biol Evol. 2014 Dec 4;6(12):3295-313. doi: 10.1093/gbe/evu261.

Tracing the Diploid Ancestry of the Cultivated Octoploid Strawberry.追溯栽培八倍体草莓的二倍体祖先。

Mol Biol Evol. 2021 Jan 23;38(2):478-485. doi: 10.1093/molbev/msaa238.

Tracking the evolutionary history of polyploidy in Fragaria L. (strawberry): new insights from phylogenetic analyses of low-copy nuclear genes.追踪草莓属（草莓）多倍体的进化历史：来自低拷贝核基因系统发育分析的新见解

Mol Phylogenet Evol. 2009 Jun;51(3):515-30. doi: 10.1016/j.ympev.2008.12.024. Epub 2009 Jan 4.

flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning.flopp：通过均匀树分区实现超快速长读多倍体单体型相位。

J Comput Biol. 2022 Feb;29(2):195-211. doi: 10.1089/cmb.2021.0436. Epub 2022 Jan 17.

引用本文的文献

The reference genome of the human diploid cell line RPE-1.人类二倍体细胞系RPE-1的参考基因组。

Nat Commun. 2025 Sep 12;16(1):7751. doi: 10.1038/s41467-025-62428-z.

The genome sequence of the Marsh Pennywort, L. (Apiales: Araliaceae).天胡荽（伞形目：五加科）的基因组序列。

Wellcome Open Res. 2025 Jul 28;10:370. doi: 10.12688/wellcomeopenres.24582.1. eCollection 2025.

A nearly complete haplotype-phased genome assembly of nerve plant () provides insights into leaf color evolution.一种近乎完整的单倍型定相的紫叶鸭跖草基因组组装为叶片颜色进化提供了见解。

Hortic Res. 2025 Jun 26;12(9):uhaf154. doi: 10.1093/hr/uhaf154. eCollection 2025 Sep.

The genome sequence of (Scopoli, 1763) (Lepidoptera: Geometridae).（斯科普利，1763年）（鳞翅目：尺蛾科）的基因组序列。

Wellcome Open Res. 2025 Jul 30;10:392. doi: 10.12688/wellcomeopenres.24664.1. eCollection 2025.

The genome sequence of the Black Hairstreak, (Linnaeus, 1758) (Lepidoptera: Lycaenidae).黑纹尾蛱蝶（林奈，1758年）（鳞翅目：灰蝶科）的基因组序列

Wellcome Open Res. 2025 Jul 28;10:377. doi: 10.12688/wellcomeopenres.24619.1. eCollection 2025.

The genome sequence of a flea beetle, Aubé, 1843.一种跳甲（奥贝，1843年）的基因组序列。

Wellcome Open Res. 2025 Jun 2;10:297. doi: 10.12688/wellcomeopenres.24269.1. eCollection 2025.

Haplotype-resolved genomes of reveal nuclear differentiation, TE-mediated variation, and saprotrophic potential.[具体物种]的单倍型解析基因组揭示了核分化、转座子介导的变异和腐生潜力。

IMA Fungus. 2025 Aug 28;16:e161411. doi: 10.3897/imafungus.16.161411. eCollection 2025.

The genome sequence of the Bordered Sallow moth, (Hufnagel, 1766).缘饰浅黄蛾（Hufnagel，1766年）的基因组序列。

Wellcome Open Res. 2025 Apr 23;10:208. doi: 10.12688/wellcomeopenres.24001.1. eCollection 2025.

The genome sequence of the Mountain Bumble Bee, Smith, 1849 (Hymenoptera: Apidae).山地熊蜂（史密斯，1849年）（膜翅目：蜜蜂科）的基因组序列

Wellcome Open Res. 2025 Jul 25;10:367. doi: 10.12688/wellcomeopenres.24577.1. eCollection 2025.

The genome sequence of the Dusky Meadow Brown, (Lepidoptera: Nymphalidae).暗褐蛱蝶（鳞翅目：蛱蝶科）的基因组序列。

Wellcome Open Res. 2025 Jul 31;10:395. doi: 10.12688/wellcomeopenres.24656.1. eCollection 2025.

本文引用的文献

The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum.首个六倍体普通小麦基因组的近完整组装。

Gigascience. 2017 Nov 1;6(11):1-7. doi: 10.1093/gigascience/gix097.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GenomeScope 2.0 和 Smudgeplot 用于无参考的多倍体基因组剖析。

GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献