Pluzhnikov A, Donnelly P
Department of Statistics, University of Chicago, Illinois 60637, USA.
Genetics. 1996 Nov;144(3):1247-62. doi: 10.1093/genetics/144.3.1247.
Two commonly used measures of genetic diversity for intraspecies DNA sequence data are based, respectively, on the number of segregating sites, and on the average number of pairwise nucleotide differences. Expressions are derived for their variance in the presence of intragenic recombination for a panmictic population of fixed size that is at neutral equilibrium at the region sequenced. We show that, in contrast to the slow decrease in variance with increasing sample size, if the recombination rate is nonzero, the asymptotic rate of decrease of variance with increasing sequence length, for fixed sample size, is quite rapid. In particular, it is close to that which would be obtained by sequencing independent chromosome regions. The correlation between measures of diversity from linked regions is also examined. For a given total number of bases sequenced in a particular region, optimal sequencing strategies are derived. These typically involve sequencing relatively few (three to 10) long copies of the region. Under optimal strategies, the variances of the two measures are very similar for most parameter values considered. Results concerning optimal sequencing strategies will be sensitive to gross departures from the underlying assumptions, such as population bottlenecks, selective sweeps, and substantial population substructure.
种内DNA序列数据的两种常用遗传多样性度量方法分别基于分离位点的数量和成对核苷酸差异的平均数。对于在测序区域处于中性平衡的固定大小随机交配群体,在存在基因内重组的情况下推导了它们的方差表达式。我们表明,与随着样本量增加方差缓慢下降不同,如果重组率不为零,对于固定样本量,随着序列长度增加方差的渐近下降速率相当快。特别是,它接近于通过对独立染色体区域进行测序所得到的速率。还研究了来自连锁区域的多样性度量之间的相关性。对于在特定区域测序的给定碱基总数,推导了最优测序策略。这些策略通常涉及对该区域相对较少(三到十个)的长拷贝进行测序。在最优策略下,对于所考虑的大多数参数值,这两种度量的方差非常相似。关于最优测序策略的结果对于偏离基本假设的情况(如种群瓶颈、选择性清除和大量种群亚结构)将非常敏感。