Hudson R R, Kaplan N L
Genetics. 1986 Aug;113(4):1057-76. doi: 10.1093/genetics/113.4.1057.
Within-population variation at the DNA level will rarely be studied by sequencing of loci of randomly chosen individuals. Instead, individuals will usually be chosen for sequencing based on some knowledge of their genotype. Data collected in this way require new sampling theory. Motivated by these observations, we have examined the sampling properties of a finite population model with two mutation processes and with no selection or recombination. One mutation process generates new alleles according to an infinite-alleles model, and the other generates polymorphisms at sites according to an infinite-sites model. A sample of n genes is considered. The stationary distribution of the number of segregating sites in a subsample from one of the allelic classes in the sample conditional on the allelic configuration of the sample is studied. A recursive scheme is developed to compute the moments of this distribution, and it is shown that the distribution is functionally independent of the number of additional alleles in the sample and their respective frequencies in the sample. For the case in which the sample contains only two alleles, the distribution of the number of segregating sites in a subsample containing both alleles conditional on the sample frequencies of the alleles is studied. The results are applied to the analysis of DNA sequences of two alleles found at the Adh locus of Drosophila melanogaster. No significant departure from the neutral model is detected.
通过对随机选择个体的基因座进行测序,很少会研究群体内部在DNA水平上的变异。相反,通常会根据个体基因型的某些信息来选择个体进行测序。以这种方式收集的数据需要新的抽样理论。受这些观察结果的启发,我们研究了一个具有两个突变过程且无选择或重组的有限群体模型的抽样特性。一个突变过程根据无限等位基因模型产生新的等位基因,另一个突变过程根据无限位点模型在各位点产生多态性。考虑一个包含n个基因的样本。研究了在样本的等位基因配置条件下,样本中等位基因类之一的子样本中分离位点数量的平稳分布。开发了一种递归方案来计算该分布的矩,并且表明该分布在功能上独立于样本中额外等位基因的数量及其在样本中的各自频率。对于样本仅包含两个等位基因的情况,研究了在等位基因的样本频率条件下,包含两个等位基因的子样本中分离位点数量的分布。这些结果被应用于对黑腹果蝇乙醇脱氢酶(Adh)基因座上发现的两个等位基因的DNA序列分析。未检测到与中性模型有显著偏差。