Liao Xiangyu, Zhu Wufei, Liu Chaoyun
Department of Oncology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, Yichang, China.
Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, Yichang, China.
Front Genet. 2024 Aug 22;15:1451730. doi: 10.3389/fgene.2024.1451730. eCollection 2024.
In the realm of next-generation sequencing datasets, various characteristics can be extracted through based analysis. Among these characteristics, genome size (GS) is one that can be estimated with relative ease, yet achieving satisfactory accuracy, especially in the context of heterozygosity, remains a challenge.
In this study, we introduce a high-precision genome size estimator, (Genome Size Estimation Tool), which is based on histogram correction.
We have evaluated on both simulated and real datasets. The experimental results demonstrate that this tool can estimate genome size with greater precision, even surpassing the accuracy of state-of-the-art tools. Notably, GSET also performs satisfactorily on heterozygous datasets, where other tools struggle to produce useable results.
The processing model of diverges from the popular data fitting models used by similar tools. Instead, it is derived from empirical data and incorporates a correction term to mitigate the impact of sequencing errors on genome size estimation. is freely available for use and can be accessed at the following URL: https://github.com/Xingyu-Liao/GSET.
在下一代测序数据集领域,可以通过基于[具体内容缺失]的分析提取各种特征。在这些特征中,基因组大小(GS)是相对容易估计的一个,但要达到令人满意的准确性,尤其是在杂合性背景下,仍然是一个挑战。
在本研究中,我们引入了一种高精度基因组大小估计工具[具体名称缺失](基因组大小估计工具),它基于[具体内容缺失]直方图校正。
我们在模拟数据集和真实数据集上对[具体名称缺失]进行了评估。实验结果表明,该工具能够以更高的精度估计基因组大小,甚至超过了现有最先进工具的准确性。值得注意的是,GSET在杂合数据集上也表现出色,而其他工具在这类数据集上难以产生可用的结果。
[具体名称缺失]的处理模型与类似工具使用的流行数据拟合模型不同。相反,它源自经验数据,并纳入了一个校正项,以减轻测序错误对基因组大小估计的影响。[具体名称缺失]可免费使用,可通过以下网址访问:https://github.com/Xingyu-Liao/GSET。