Suppr超能文献

高覆盖度与低覆盖度测序下估计的成对遗传距离的差异:连锁不平衡的作用。

Variance in estimated pairwise genetic distance under high versus low coverage sequencing: The contribution of linkage disequilibrium.

作者信息

Shpak Max, Ni Yang, Lu Jie, Müller Peter

机构信息

Sarah Cannon Research Institute, Nashville TN 37203, USA; Center for Systems and Synthetic Biology, University of Texas, Austin TX 78712, USA; Fresh Pond Research Institute, Cambridge MA 02140, USA.

Department of Statistics and Data Science, University of Texas, Austin TX 78712, USA.

出版信息

Theor Popul Biol. 2017 Oct;117:51-63. doi: 10.1016/j.tpb.2017.08.001. Epub 2017 Aug 24.

Abstract

The mean pairwise genetic distance among haplotypes is an estimator of the population mutation rate θ and a standard measure of variation in a population. With the advent of next-generation sequencing (NGS) methods, this and other population parameters can be estimated under different modes of sampling. One approach is to sequence individual genomes with high coverage, and to calculate genetic distance over all sample pairs. The second approach, typically used for microbial samples or for tumor cells, is sequencing a large number of pooled genomes with very low individual coverage. With low coverage, pairwise genetic distances are calculated across independently sampled sites rather than across individual genomes. In this study, we show that the variance in genetic distance estimates is reduced with low coverage sampling if the mean pairwise linkage disequilibrium weighted by allele frequencies is positive. Practically, this means that if on average the most frequent alleles over pairs of loci are in positive linkage disequilibrium, low coverage sequencing results in improved estimates of θ, assuming similar per-site read depths. We show that this result holds under the expected distribution of allele frequencies and linkage disequilibria for an infinite sites model at mutation-drift equilibrium. From simulations, we find that the conditions for reduced variance only fail to hold in cases where variant alleles are few and at very low frequency. These results are applied to haplotype frequencies from a lung cancer tumor to compute the weighted linkage disequilibria and the expected error in estimated genetic distance using high versus low coverage.

摘要

单倍型之间的平均成对遗传距离是群体突变率θ的一个估计值,也是群体变异的一种标准度量。随着下一代测序(NGS)方法的出现,可以在不同的抽样模式下估计这个参数以及其他群体参数。一种方法是对个体基因组进行高覆盖测序,并计算所有样本对之间的遗传距离。第二种方法,通常用于微生物样本或肿瘤细胞,是对大量低个体覆盖的混合基因组进行测序。在低覆盖情况下,成对遗传距离是在独立抽样的位点之间计算,而不是在个体基因组之间计算。在本研究中,我们表明,如果等位基因频率加权的平均成对连锁不平衡为正,那么低覆盖抽样会降低遗传距离估计值的方差。实际上,这意味着如果平均而言,位点对中最常见的等位基因处于正连锁不平衡状态,那么在假设每个位点读深度相似的情况下,低覆盖测序会提高θ的估计值。我们表明,在突变 - 漂变平衡的无限位点模型下,等位基因频率和连锁不平衡的预期分布中,这一结果成立。通过模拟,我们发现只有在变异等位基因很少且频率极低的情况下,方差减小的条件才不成立。这些结果被应用于肺癌肿瘤的单倍型频率,以计算加权连锁不平衡以及使用高覆盖和低覆盖估计遗传距离时的预期误差。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验