• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ploidyfrost:基于 de Bruijn 图的无参全基因组测序数据的倍性水平估计。

ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs.

机构信息

State Key Laboratory of Earth Surface Processes and Resource Ecology and Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China.

出版信息

Mol Ecol Resour. 2023 Feb;23(2):499-510. doi: 10.1111/1755-0998.13720. Epub 2022 Nov 1.

DOI:10.1111/1755-0998.13720
PMID:36239149
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10092044/
Abstract

Polyploidy is ubiquitous and its consequences are complex and variable. A change of ploidy level generally influences genetic diversity and results in morphological, physiological and ecological differences between cells or organisms with different ploidy levels. To avoid cumbersome experiments and take advantage of the less biased information provided by the vast amounts of genome sequencing data, computational tools for ploidy estimation are urgently needed. Until now, although a few such tools have been developed, many aspects of this estimation, such as the requirement of a reference genome, the lack of informative results and objective inferences, and the influence of false positives from errors and repeats, need further improvement. We have developed ploidyfrost, a de Bruijn graph-based method, to estimate ploidy levels from whole genome sequencing data sets without a reference genome. ploidyfrost provides a visual representation of allele frequency distribution generated using the ggplot2 package as well as quantitative results using the Gaussian mixture model. In addition, it takes advantage of colouring information encoded in coloured de Bruijn graphs to analyse multiple samples simultaneously and to flexibly filter putative false positives. We evaluated the performance of ploidyfrost by analysing highly heterozygous or repetitive samples of Cyclocarya paliurus and a complex allooctoploid sample of Fragaria × ananassa. Moreover, we demonstrated that the accuracy of analysis results can be improved by constraining a threshold such as Cramér's V coefficient on variant features, which may significantly reduce the side effects of sequencing errors and annoying repeats on the graphical structure constructed.

摘要

多倍体是普遍存在的,其后果是复杂和多变的。倍性水平的变化通常会影响遗传多样性,并导致不同倍性水平的细胞或生物体在形态、生理和生态上的差异。为了避免繁琐的实验,并利用大量基因组测序数据提供的信息偏差较小的优势,迫切需要开发用于倍性估计的计算工具。到目前为止,尽管已经开发了一些这样的工具,但在这种估计的许多方面,例如需要参考基因组、缺乏有信息的结果和客观推断、以及错误和重复产生的假阳性的影响,都需要进一步改进。我们开发了 ploidyfrost,这是一种基于 de Bruijn 图的方法,可以在没有参考基因组的情况下从全基因组测序数据集估计倍性水平。ploidyfrost 使用 ggplot2 包生成等位基因频率分布的可视化表示,以及使用高斯混合模型的定量结果。此外,它还利用彩色 de Bruijn 图中编码的颜色信息来同时分析多个样本,并灵活地过滤可疑的假阳性。我们通过分析高度杂合或重复的 Cyclocarya paliurus 样本和 Fragaria × ananassa 的复杂 alloctoploid 样本来评估 ploidyfrost 的性能。此外,我们还证明通过在变异特征上限制 Cramér V 系数等阈值,可以提高分析结果的准确性,这可能会显著减少测序错误和恼人的重复对构建图形结构的副作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/20d9bfa8332d/MEN-23-499-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/2df437ec4430/MEN-23-499-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/8f82783105ba/MEN-23-499-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/8650932c1074/MEN-23-499-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/ab07702c9ffa/MEN-23-499-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/e18212f97291/MEN-23-499-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/20d9bfa8332d/MEN-23-499-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/2df437ec4430/MEN-23-499-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/8f82783105ba/MEN-23-499-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/8650932c1074/MEN-23-499-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/ab07702c9ffa/MEN-23-499-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/e18212f97291/MEN-23-499-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6844/10092044/20d9bfa8332d/MEN-23-499-g006.jpg

相似文献

1
ploidyfrost: Reference-free estimation of ploidy level from whole genome sequencing data based on de Bruijn graphs.ploidyfrost:基于 de Bruijn 图的无参全基因组测序数据的倍性水平估计。
Mol Ecol Resour. 2023 Feb;23(2):499-510. doi: 10.1111/1755-0998.13720. Epub 2022 Nov 1.
2
Integrating long-range connectivity information into de Bruijn graphs.将长程连接信息整合到 de Bruijn 图中。
Bioinformatics. 2018 Aug 1;34(15):2556-2565. doi: 10.1093/bioinformatics/bty157.
3
BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs.BrownieAligner:Illumina 测序数据到 de Bruijn 图的精确比对。
BMC Bioinformatics. 2018 Sep 4;19(1):311. doi: 10.1186/s12859-018-2319-7.
4
Assembly of long error-prone reads using de Bruijn graphs.使用德布鲁因图组装长易错读段。
Proc Natl Acad Sci U S A. 2016 Dec 27;113(52):E8396-E8405. doi: 10.1073/pnas.1604560113. Epub 2016 Dec 12.
5
nQuire: a statistical framework for ploidy estimation using next generation sequencing.nQuire:一种使用下一代测序进行倍性估计的统计框架。
BMC Bioinformatics. 2018 Apr 4;19(1):122. doi: 10.1186/s12859-018-2128-z.
6
AbsCN-seq: a statistical method to estimate tumor purity, ploidy and absolute copy numbers from next-generation sequencing data.AbsCN-seq:一种从下一代测序数据中估计肿瘤纯度、倍性和绝对拷贝数的统计方法。
Bioinformatics. 2014 Apr 15;30(8):1056-1063. doi: 10.1093/bioinformatics/btt759. Epub 2014 Jan 2.
7
Lossless indexing with counting de Bruijn graphs.基于计数型 de Bruijn 图的无损索引
Genome Res. 2022 Sep 27;32(9):1754-1764. doi: 10.1101/gr.276607.122.
8
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
9
Detection of simple and complex de novo mutations with multiple reference sequences.检测具有多个参考序列的简单和复杂从头突变。
Genome Res. 2020 Aug;30(8):1154-1169. doi: 10.1101/gr.255505.119. Epub 2020 Aug 19.
10
Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph.使用概率性德布鲁因图对高通量测序数据进行无参考压缩
BMC Bioinformatics. 2015 Sep 14;16:288. doi: 10.1186/s12859-015-0709-7.

引用本文的文献

1
Spatial ploidy inference using quantitative imaging.使用定量成像进行空间倍性推断
bioRxiv. 2025 Mar 17:2025.03.11.642217. doi: 10.1101/2025.03.11.642217.
2
Variant calling in polyploids for population and quantitative genetics.多倍体中用于群体和数量遗传学的变异检测
Appl Plant Sci. 2024 Jul 17;12(4):e11607. doi: 10.1002/aps3.11607. eCollection 2024 Jul-Aug.
3
nQuack: An R package for predicting ploidal level from sequence data using site-based heterozygosity.nQuack:一个使用基于位点的杂合性从序列数据预测倍性水平的R软件包。

本文引用的文献

1
A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes.一种三坐标坐标系统,用于快速准确地分析基于三色 de Bruijn 图的泛基因组。
BMC Bioinformatics. 2021 May 27;22(1):282. doi: 10.1186/s12859-021-04149-w.
2
Detecting high-scoring local alignments in pangenome graphs.在泛基因组图中检测高分局部比对。
Bioinformatics. 2021 Aug 25;37(16):2266-2274. doi: 10.1093/bioinformatics/btab077.
3
Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs.
Appl Plant Sci. 2024 Jul 14;12(4):e11606. doi: 10.1002/aps3.11606. eCollection 2024 Jul-Aug.
4
LocoGSE, a sequence-based genome size estimator for plants.LocoGSE,一种基于序列的植物基因组大小估计工具。
Front Plant Sci. 2024 Mar 14;15:1328966. doi: 10.3389/fpls.2024.1328966. eCollection 2024.
5
Development of a risk model to predict prognosis in breast cancer based on cGAS-STING-related genes.基于cGAS-STING相关基因的乳腺癌预后预测风险模型的开发。
Front Genet. 2023 Mar 27;14:1121018. doi: 10.3389/fgene.2023.1121018. eCollection 2023.
Bifrost:彩色紧凑布隆图的高度并行构建和索引
Genome Biol. 2020 Sep 17;21(1):249. doi: 10.1186/s13059-020-02135-8.
4
Polyploidy: A Biological Force From Cells to Ecosystems.多倍体:从细胞到生态系统的生物学力量。
Trends Cell Biol. 2020 Sep;30(9):688-694. doi: 10.1016/j.tcb.2020.06.006. Epub 2020 Jul 6.
5
GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.GenomeScope 2.0 和 Smudgeplot 用于无参考的多倍体基因组剖析。
Nat Commun. 2020 Mar 18;11(1):1432. doi: 10.1038/s41467-020-14998-3.
6
Differentiating diploid and triploid individuals using single nucleotide polymorphisms genotyped by amplicon sequencing.使用扩增子测序进行单核苷酸多态性基因分型来区分二倍体和三倍体个体。
Mol Ecol Resour. 2019 Nov;19(6):1545-1551. doi: 10.1111/1755-0998.13073. Epub 2019 Sep 18.
7
Origin and evolution of the octoploid strawberry genome.八倍体草莓基因组的起源和进化。
Nat Genet. 2019 Mar;51(3):541-547. doi: 10.1038/s41588-019-0356-4. Epub 2019 Feb 25.
8
Current Strategies of Polyploid Plant Genome Sequence Assembly.多倍体植物基因组序列组装的当前策略
Front Plant Sci. 2018 Nov 21;9:1660. doi: 10.3389/fpls.2018.01660. eCollection 2018.
9
Genome-wide somatic variant calling using localized colored de Bruijn graphs.使用局部彩色德布鲁因图进行全基因组体细胞变异检测
Commun Biol. 2018 Mar 22;1:20. doi: 10.1038/s42003-018-0023-9. eCollection 2018.
10
User's guide to correlation coefficients.相关系数用户指南。
Turk J Emerg Med. 2018 Aug 7;18(3):91-93. doi: 10.1016/j.tjem.2018.08.001. eCollection 2018 Sep.