Robertson A, Hill W G
Genetics. 1984 Aug;107(4):703-18. doi: 10.1093/genetics/107.4.703.
An analysis is made of the distribution of deviations from Hardy-Weinberg proportions with k alleles and of estimates of inbreeding coefficients (f) obtained from these deviations. If f is small, the best estimate of f in large samples is shown to be 2 sigma i(Tii/Ni)/(k - 1), where Tii is an unbiased measure of the excess of the ith homozygote and Ni the number of the ith allele in the sample [frequency = Ni/(2N)]. No extra information is obtained from the Tij, where these are departures of numbers of heterozygotes from expectation. Alternatively, the best estimator can be computed from the Tij, ignoring the Tii. Also (1) the variance of the estimate of f equals 1/(N(k - 1] when all individuals in the sample are unrelated, and the test for f = 0 with 1 d.f. is given by the ratio of the estimate to its standard error; (2) the variance is reduced if some alleles are rare; and (3) if the sample consists of full-sib families of size n, the variance is increased by a proportion (n - 1)/4 but is not increased by a half-sib relationship. If f is not small, the structure of the population is of critical importance. (1) If the inbreeding is due to a proportion of inbred matings in an otherwise random-breeding population, f as determined from homozygote excess is the same for all genes and expressions are given for its sampling variance. (2) If the homozygote excess is due to population admixture, f is not the same for all genes. The above estimator is probably close to the best for all f values.
对具有k个等位基因的哈迪-温伯格比例偏差分布以及从这些偏差中获得的近交系数(f)估计值进行了分析。如果f较小,在大样本中f的最佳估计值为2∑i(Tii/Ni)/(k - 1),其中Tii是第i个纯合子过量的无偏度量,Ni是样本中第i个等位基因的数量[频率 = Ni/(2N)]。从杂合子数量与预期的偏差Tij中无法获得额外信息。或者,可以忽略Tii,从Tij计算最佳估计量。此外,(1)当样本中的所有个体无亲缘关系时,f估计值的方差等于1/(N(k - 1)),f = 0的单自由度检验由估计值与其标准误差的比值给出;(2)如果某些等位基因罕见,方差会减小;(3)如果样本由大小为n的全同胞家系组成,方差会增加比例(n - 1)/4,但半同胞关系不会增加方差。如果f不小,群体结构至关重要。(1)如果近交是由于在其他方面随机交配的群体中存在一定比例的近交交配,从纯合子过量确定的f对所有基因都是相同的,并给出了其抽样方差的表达式。(2)如果纯合子过量是由于群体混合造成的,f对所有基因并不相同。上述估计量可能对所有f值都接近最佳。