Tajima F
Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan.
Genetics. 1996 Jul;143(3):1457-65. doi: 10.1093/genetics/143.3.1457.
The expectations of the average number of nucleotide differences per site (pi), the proportion of segregating site (s), the minimum number of mutations per site (s*) and some other quantities were derived under the finite site models with and without rate variation among sites, where the finite site models include Jukes and Cantor's model, the equal-input model and Kimura's model. As a model of rate variation, the gamma distribution was used. The results indicate that if distribution parameter alpha is small, the effect of rate variation on these quantities are substantial, so that the estimates of theta based on the infinite site model are substantially underestimated, where theta = 4Nv, N is the effective population size and v is the mutation rate per site per generation. New methods for estimating theta are also presented, which are based on the finite site models with and without rate variation. Using these methods, underestimation can be corrected.
在具有和不具有位点间速率变化的有限位点模型下,推导了每位点核苷酸差异平均数(π)、分离位点比例(s)、每位点最小突变数(s*)以及其他一些量的期望值,其中有限位点模型包括朱克斯-坎托模型、等输入模型和木村模型。作为速率变化模型,使用了伽马分布。结果表明,如果分布参数α较小,速率变化对这些量的影响很大,以至于基于无限位点模型的θ估计值被大幅低估,其中θ = 4Nv,N是有效种群大小,v是每代每位点的突变率。还提出了基于具有和不具有速率变化的有限位点模型来估计θ的新方法。使用这些方法,可以纠正低估问题。