Lewontin R C
Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA.
Genetics. 1995 May;140(1):377-88. doi: 10.1093/genetics/140.1.377.
Studies of genetic variation in natural populations at the sequence level usually show that most polymorphic sites are very asymmetrical in allele frequencies, with the rarer allele at a site near fixation. When the rarer allele at a site is present only a few times in the sample, say below five representatives, it becomes very difficult to detect linkage disequilibrium between sites from tests of association. This is a consequence of the numerical properties of even the most powerful test of association, Fisher's exact test. Sites with fewer than five representatives in the sample should be excluded from association tests, but this generally leaves few site pairs eligible for testing. A test for overall linkage disequilibrium, based on the sign of the observed linkage disequilibria, is derived which can use all the data. It is shown that more power can be achieved by increasing the length of sequence determined than by increasing the number of genomes sampled for the same total work.
在序列水平上对自然种群中的遗传变异进行研究通常表明,大多数多态性位点的等位基因频率非常不对称,位点上较罕见的等位基因接近固定状态。当位点上较罕见的等位基因在样本中仅出现几次时,比如少于五个代表,通过关联测试来检测位点之间的连锁不平衡就变得非常困难。这是即使是最强大的关联测试——费舍尔精确检验的数值特性所导致的结果。样本中代表少于五个的位点应从关联测试中排除,但这通常会使符合测试条件的位点对很少。基于观察到的连锁不平衡的符号,推导出了一种可以使用所有数据的总体连锁不平衡测试方法。结果表明,对于相同的总工作量,通过增加测定的序列长度比增加采样的基因组数量能够获得更大的功效。