Center for Research in Agricultural Genomics (CRAG), UAB, 08193, Bellaterra, Spain.
Mol Ecol. 2013 Nov;22(22):5561-76. doi: 10.1111/mec.12522. Epub 2013 Oct 28.
Next generation sequencing of pooled samples is an effective approach for studies of variability and differentiation in populations. In this paper we provide a comprehensive set of estimators of the most common statistics in population genetics based on the frequency spectrum, namely the Watterson estimator θW, nucleotide pairwise diversity Π, Tajima's D, Fu and Li's D and F, Fay and Wu's H, McDonald-Kreitman and HKA tests and FST, corrected for sequencing errors and ascertainment bias. In a simulation study, we show that pool and individual θ estimates are highly correlated and discuss how the performance of the statistics vary with read depth and sample size in different evolutionary scenarios. As an application, we reanalyse sequences from Drosophila mauritiana and from an evolution experiment in Drosophila melanogaster. These methods are useful for population genetic projects with limited budget, study of communities of individuals that are hard to isolate, or autopolyploid species.
基于频率谱的混合样本新一代测序是研究群体变异性和分化的有效方法。本文提供了一套基于频率谱的最常见群体遗传学统计量估计器,包括 Watterson 估计量θW、核苷酸成对多样性Π、 Tajima's D、Fu 和 Li 的 D 和 F、Fay 和 Wu 的 H、McDonald-Kreitman 和 HKA 检验以及 FST,这些估计器都经过了测序错误和检出偏差的校正。在模拟研究中,我们表明混合样本和个体θ估计值高度相关,并讨论了在不同进化情景下,统计量的性能如何随读取深度和样本量而变化。作为应用,我们重新分析了来自Drosophila mauritiana 和 Drosophila melanogaster 进化实验的序列。这些方法对于预算有限的群体遗传学项目、难以分离的个体群体研究,或同源多倍体物种的研究非常有用。