Suppr超能文献

findGSEP:使用 k-mer 频率估计多倍体物种的基因组大小。

findGSEP: estimating genome size of polyploid species using k-mer frequencies.

机构信息

School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

Research Institute of Xi'an Jiaotong University, Zhejiang, Hangzhou 311200, China.

出版信息

Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae647.

Abstract

SUMMARY

Estimating genome size using k-mer frequencies, which plays a fundamental role in designing genome sequencing and analysis projects, has remained challenging for polyploid species, i.e., ploidy p > 2. To address this, we introduce "findGSEP," which is designed based on iterative curve fitting of k-mer frequencies. Precisely, it first disentangles up to p normal distributions by analyzing k-mer frequencies in whole genome sequencing of the focal species. Second, it computes the sizes of genomic regions related to 1∼p (homologous) chromosome(s) using each respective curve fitting, from which it infers the full polyploid and average haploid genome size. "findGSEP" can handle any level of ploidy p, and infer more accurate genome size than other well-known tools, as shown by tests using simulated and real genomic sequencing data of various species including octoploids.

AVAILABILITY AND IMPLEMENTATION

"findGSEP" was implemented as a web server, which is freely available at http://146.56.237.198:3838/findGSEP/. Also, "findGSEP" was implemented as an R package for parallel processing of multiple samples. Source code and tutorial on its installation and usage is available at https://github.com/sperfu/findGSEP.

摘要

摘要

使用 k-mer 频率估计基因组大小在设计基因组测序和分析项目中起着至关重要的作用,但对于多倍体物种(即ploidy p > 2)来说,这仍然具有挑战性。为了解决这个问题,我们引入了“findGSEP”,它是基于 k-mer 频率的迭代曲线拟合设计的。具体来说,它首先通过分析焦点物种的全基因组测序中的 k-mer 频率,通过分析将多达 p 个正态分布分离出来。其次,它使用每个相应的曲线拟合来计算与 1∼p(同源)染色体相关的基因组区域的大小,从中推断出完整的多倍体和平均单倍体基因组大小。“findGSEP”可以处理任何倍数的 p,并且可以比其他知名工具更准确地推断基因组大小,这在使用各种物种(包括八倍体)的模拟和真实基因组测序数据进行的测试中得到了证明。

可用性和实现

“findGSEP”被实现为一个网络服务器,可以在 http://146.56.237.198:3838/findGSEP/ 上免费获得。此外,“findGSEP”还被实现为一个用于并行处理多个样本的 R 包。其安装和使用的源代码和教程可在 https://github.com/sperfu/findGSEP 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eed3/11552620/13fcd0a798d2/btae647f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验