• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

findGSEP:使用 k-mer 频率估计多倍体物种的基因组大小。

findGSEP: estimating genome size of polyploid species using k-mer frequencies.

机构信息

School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

Research Institute of Xi'an Jiaotong University, Zhejiang, Hangzhou 311200, China.

出版信息

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae647.

DOI:10.1093/bioinformatics/btae647
PMID:39475440
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11552620/
Abstract

SUMMARY

Estimating genome size using k-mer frequencies, which plays a fundamental role in designing genome sequencing and analysis projects, has remained challenging for polyploid species, i.e., ploidy p > 2. To address this, we introduce "findGSEP," which is designed based on iterative curve fitting of k-mer frequencies. Precisely, it first disentangles up to p normal distributions by analyzing k-mer frequencies in whole genome sequencing of the focal species. Second, it computes the sizes of genomic regions related to 1∼p (homologous) chromosome(s) using each respective curve fitting, from which it infers the full polyploid and average haploid genome size. "findGSEP" can handle any level of ploidy p, and infer more accurate genome size than other well-known tools, as shown by tests using simulated and real genomic sequencing data of various species including octoploids.

AVAILABILITY AND IMPLEMENTATION

"findGSEP" was implemented as a web server, which is freely available at http://146.56.237.198:3838/findGSEP/. Also, "findGSEP" was implemented as an R package for parallel processing of multiple samples. Source code and tutorial on its installation and usage is available at https://github.com/sperfu/findGSEP.

摘要

摘要

使用 k-mer 频率估计基因组大小在设计基因组测序和分析项目中起着至关重要的作用,但对于多倍体物种(即ploidy p > 2)来说,这仍然具有挑战性。为了解决这个问题,我们引入了“findGSEP”,它是基于 k-mer 频率的迭代曲线拟合设计的。具体来说,它首先通过分析焦点物种的全基因组测序中的 k-mer 频率,通过分析将多达 p 个正态分布分离出来。其次,它使用每个相应的曲线拟合来计算与 1∼p(同源)染色体相关的基因组区域的大小,从中推断出完整的多倍体和平均单倍体基因组大小。“findGSEP”可以处理任何倍数的 p,并且可以比其他知名工具更准确地推断基因组大小,这在使用各种物种(包括八倍体)的模拟和真实基因组测序数据进行的测试中得到了证明。

可用性和实现

“findGSEP”被实现为一个网络服务器,可以在 http://146.56.237.198:3838/findGSEP/ 上免费获得。此外,“findGSEP”还被实现为一个用于并行处理多个样本的 R 包。其安装和使用的源代码和教程可在 https://github.com/sperfu/findGSEP 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/11552620/13fcd0a798d2/btae647f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/11552620/13fcd0a798d2/btae647f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/11552620/13fcd0a798d2/btae647f1.jpg

相似文献

1
findGSEP: estimating genome size of polyploid species using k-mer frequencies.findGSEP:使用 k-mer 频率估计多倍体物种的基因组大小。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae647.
2
GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.GenomeScope 2.0 和 Smudgeplot 用于无参考的多倍体基因组剖析。
Nat Commun. 2020 Mar 18;11(1):1432. doi: 10.1038/s41467-020-14998-3.
3
findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies.findGSE:使用 k -mer 频率估计人类和拟南芥基因组大小的变化。
Bioinformatics. 2018 Feb 15;34(4):550-557. doi: 10.1093/bioinformatics/btx637.
4
ntCard: a streaming algorithm for cardinality estimation in genomics data.ntCard:一种用于基因组数据基数估计的流算法。
Bioinformatics. 2017 May 1;33(9):1324-1330. doi: 10.1093/bioinformatics/btw832.
5
nPhase: an accurate and contiguous phasing method for polyploids.nPhase:一种用于多倍体的准确连续相位方法。
Genome Biol. 2021 Apr 29;22(1):126. doi: 10.1186/s13059-021-02342-x.
6
KCOSS: an ultra-fast k-mer counter for assembled genome analysis.KCOSS:用于组装基因组分析的超快速k-mer计数器。
Bioinformatics. 2022 Jan 27;38(4):933-940. doi: 10.1093/bioinformatics/btab797.
7
Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.Lerna:用于配置短读和长读基因组测序错误纠正工具的变压器架构。
BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.
8
A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.一种计算K-mer频率的新方法及其在大型重复植物基因组注释中的应用。
BMC Genomics. 2008 Oct 31;9:517. doi: 10.1186/1471-2164-9-517.
9
Squeakr: an exact and approximate k-mer counting system.Squeakr:一种精确和近似的 k-mer 计数系统。
Bioinformatics. 2018 Feb 15;34(4):568-575. doi: 10.1093/bioinformatics/btx636.
10
SynMap2 and SynMap3D: web-based whole-genome synteny browsers.SynMap2 和 SynMap3D:基于网络的全基因组同线性浏览器。
Bioinformatics. 2017 Jul 15;33(14):2197-2198. doi: 10.1093/bioinformatics/btx144.

本文引用的文献

1
Whole-genome Sequencing Reveals Autooctoploidy in Chinese Sturgeon and Its Evolutionary Trajectories.全基因组测序揭示中华鲟的自身多倍体现象及其进化轨迹。
Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzad002.
2
A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range.69 个拟南芥品系的泛基因组揭示了全球物种范围内的保守基因组结构。
Nat Genet. 2024 May;56(5):982-991. doi: 10.1038/s41588-024-01715-9. Epub 2024 Apr 11.
3
Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar.
四倍体马铃薯品种的染色体水平和单倍型分辨率基因组组装。
Nat Genet. 2022 Mar;54(3):342-348. doi: 10.1038/s41588-022-01015-0. Epub 2022 Mar 3.
4
Estimation of Genome Size in the Endemic Species and the Locally Rare Species Using comparative Analyses of Flow Cytometry and K-Mer Approaches.利用流式细胞术和K-mer方法的比较分析估算特有物种和局部稀有物种的基因组大小
Plants (Basel). 2021 Jul 3;10(7):1362. doi: 10.3390/plants10071362.
5
Measuring Genome Sizes Using Read-Depth, k-mers, and Flow Cytometry: Methodological Comparisons in Beetles (Coleoptera).利用读长深度、k-mer和流式细胞术测量基因组大小:甲虫(鞘翅目)的方法学比较
G3 (Bethesda). 2020 Sep 2;10(9):3047-3060. doi: 10.1534/g3.120.401028.
6
GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.GenomeScope 2.0 和 Smudgeplot 用于无参考的多倍体基因组剖析。
Nat Commun. 2020 Mar 18;11(1):1432. doi: 10.1038/s41467-020-14998-3.
7
Flow Cytometry: An Overview.流式细胞术概述
Curr Protoc Immunol. 2018 Feb 21;120:5.1.1-5.1.11. doi: 10.1002/cpim.40.
8
findGSE: estimating genome size variation within human and Arabidopsis using k-mer frequencies.findGSE:使用 k -mer 频率估计人类和拟南芥基因组大小的变化。
Bioinformatics. 2018 Feb 15;34(4):550-557. doi: 10.1093/bioinformatics/btx637.
9
The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum.首个六倍体普通小麦基因组的近完整组装。
Gigascience. 2017 Nov 1;6(11):1-7. doi: 10.1093/gigascience/gix097.
10
KMC 3: counting and manipulating k-mer statistics.KMC 3:计算和处理k-mer统计信息。
Bioinformatics. 2017 Sep 1;33(17):2759-2761. doi: 10.1093/bioinformatics/btx304.