Suppr超能文献

为全基因组数据集设计的等位基因年龄估计器在应用于全外显子组数据集时,性能仅出现适度下降。

Allele age estimators designed for whole genome datasets show only a moderate reduction in performance when applied to whole exome datasets.

作者信息

Pivirotto Alyssa, Peles Noah, Hey Jody

机构信息

Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA USA.

出版信息

bioRxiv. 2025 Mar 3:2024.02.01.578465. doi: 10.1101/2024.02.01.578465.

Abstract

Personalized genomics in the healthcare system is becoming increasingly accessible as the costs of sequencing decreases. With the increase in the number of genomes, larger numbers of rare variants are being discovered, leading to important initiatives in identifying the functional impacts in relation to disease phenotypes. One way to characterize these variants is to estimate the time the mutation entered the population. However, allele age estimators such as those implemented in the programs Relate, Genealogical Estimator of Variant Age (GEVA), and Runtc, were developed based on the assumption that datasets include the entire genome. We examined the performance of each of these estimators on simulated exome data under a neutral constant population size model, as well as under population expansion and background selection models. We found that each provides usable estimates of allele age from whole-exome datasets. Relate performs the best amongst all three estimators with Pearson coefficients of 0.83 and 0.73 (with respect to true simulated values, for neutral constant and expansion population model, respectively) with a 12 percent and 20 percent decrease in correlation between whole genome and whole exome estimations. Of the three estimators, Relate is best able to parallelize to yield quick results with little resources, however, Relate is currently only able to scale to thousands of samples making it unable to match the hundreds of thousands of samples being currently released. While more work is needed to expand the capabilities of current methods of estimating allele age, these methods show a modest decrease in performance in the estimation of the age of mutations.

摘要

随着测序成本的降低,医疗保健系统中的个性化基因组学正变得越来越容易获得。随着基因组数量的增加,越来越多的罕见变异被发现,这引发了一些重要的研究项目,旨在确定与疾病表型相关的功能影响。表征这些变异的一种方法是估计突变进入群体的时间。然而,诸如Relate、变异年龄谱系估计器(GEVA)和Runtc等程序中实现的等位基因年龄估计器,是基于数据集包含整个基因组这一假设开发的。我们在中性恒定群体大小模型以及群体扩张和背景选择模型下,研究了这些估计器在模拟外显子数据上的性能。我们发现,每个估计器都能从全外显子数据集中提供可用的等位基因年龄估计。在所有三个估计器中,Relate的表现最佳,在中性恒定和扩张群体模型中,其皮尔逊系数分别为0.83和0.73(相对于真实模拟值),全基因组和全外显子估计之间的相关性分别下降了12%和20%。在这三个估计器中,Relate最能够并行化以使用很少的资源快速得出结果,然而,Relate目前只能扩展到数千个样本,这使其无法匹配当前发布的数十万样本。虽然需要更多工作来扩展当前估计等位基因年龄方法的能力,但这些方法在估计突变年龄时性能略有下降。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f07d/11887768/da3414082267/nihpp-2024.02.01.578465v2-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验