Slatkin M, Excoffier L
Department of Integrative Biology, University of California, Berkeley 94720-3140, USA.
Heredity (Edinb). 1996 Apr;76 ( Pt 4):377-83. doi: 10.1038/hdy.1996.55.
We generalize an approach suggested by Hill (Heredity, 33, 229-239, 1974) for testing for significant association among alleles at two loci when only genotype and not haplotype frequencies are available. The principle is to use the Expectation-Maximization (EM) algorithm to resolve double heterozygotes into haplotypes and then apply a likelihood ratio test in order to determine whether the resolutions of haplotypes are significantly nonrandom, which is equivalent to testing whether there is statistically significant linkage disequilibrium between loci. The EM algorithm in this case relies on the assumption that genotype frequencies at each locus are in Hardy-Weinberg proportions. This method can accommodate X-linked loci and samples from haplodiploid species. We use three methods for testing significance of the likelihood ratio: the empirical distribution in a large number of randomized data sets, the X2 approximation for the distribution of likelihood ratios, and the Z2 test. The performance of each method is evaluated by applying it to simulated data sets and comparing the tail probability with the tail probability from Fisher's exact test applied to the actual haplotype data. For realistic sample sizes (50-150 individuals) all three methods perform well with two or three alleles per locus, but only the empirical distribution is adequate when there are five to eight alleles per locus, as is typical of hypervariable loci such as microsatellites. The method is applied to a data set of 32 microsatellite loci in a Finnish population and the results confirm the theoretical predictions. We conclude that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.
我们推广了希尔(《遗传》,第33卷,第229 - 239页,1974年)提出的一种方法,用于在仅已知基因型频率而非单倍型频率的情况下,检验两个基因座上等位基因之间的显著关联。其原理是使用期望最大化(EM)算法将双杂合子解析为单倍型,然后应用似然比检验来确定单倍型的解析是否显著非随机,这等同于检验基因座之间是否存在统计学上显著的连锁不平衡。在这种情况下,EM算法依赖于每个基因座的基因型频率符合哈迪 - 温伯格比例这一假设。该方法可适用于X连锁基因座和来自单倍二倍体物种的样本。我们使用三种方法来检验似然比的显著性:大量随机数据集的经验分布、似然比分布的卡方近似以及Z²检验。通过将每种方法应用于模拟数据集,并将尾部概率与应用于实际单倍型数据的费舍尔精确检验的尾部概率进行比较,来评估每种方法的性能。对于实际样本量(50 - 150个个体),当每个基因座有两到三个等位基因时,所有三种方法都表现良好,但当每个基因座有五到八个等位基因时,只有经验分布是足够的,这在微卫星等高度可变基因座中很典型。该方法应用于芬兰人群中32个微卫星基因座的数据集,结果证实了理论预测。我们得出结论,对于高度多态的基因座,EM算法确实能为连锁不平衡提供一种有用的检验方法,但为了正确进行显著性检验,有必要找到似然比的经验分布。