Skelly Daniel A, Magwene Paul M, Stone Eric A
Department of Biology, Duke University, Durham, North Carolina 27708.
Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695
Genetics. 2016 Feb;202(2):427-37. doi: 10.1534/genetics.115.177816. Epub 2015 Dec 29.
Demographic, genetic, or stochastic factors can lead to perfect linkage disequilibrium (LD) between alleles at two loci without respect to the extent of their physical distance, a phenomenon that Lawrence et al. (2005a) refer to as "genetic indistinguishability." This phenomenon can complicate genotype-phenotype association testing by hindering the ability to localize causal alleles, but has not been thoroughly explored from a theoretical perspective or using large, dense whole-genome polymorphism data sets. We derive a simple theoretical model of the prevalence of genetic indistinguishability between unlinked loci and verify its accuracy via simulation. We show that sample size and minor allele frequency are the major determinants of the prevalence of perfect LD between unlinked loci but that demographic factors, such as deviations from random mating, can produce significant effects as well. Finally, we quantify this phenomenon in three model organisms and find thousands of pairs of moderate-frequency ([Formula: see text]) genetically indistinguishable variants in relatively large data sets. These results clarify a previously underexplored population genetic phenomenon with important implications for association studies and define conditions under which it is likely to manifest.
人口统计学、遗传学或随机因素可导致两个基因座上等位基因之间出现完全连锁不平衡(LD),而不考虑它们的物理距离远近,劳伦斯等人(2005年a)将这种现象称为“遗传不可区分性”。这种现象会因阻碍定位因果等位基因的能力而使基因型-表型关联测试变得复杂,但尚未从理论角度或使用大型、密集的全基因组多态性数据集进行深入探讨。我们推导了一个关于不连锁基因座之间遗传不可区分性发生率的简单理论模型,并通过模拟验证了其准确性。我们表明,样本大小和次要等位基因频率是不连锁基因座之间完全连锁不平衡发生率的主要决定因素,但人口统计学因素,如偏离随机交配,也会产生显著影响。最后,我们在三种模式生物中对这一现象进行了量化,发现在相对较大的数据集中有成千上万对中等频率([公式:见正文])的遗传不可区分变体。这些结果阐明了一种此前未被充分探索的群体遗传现象,对关联研究具有重要意义,并确定了其可能出现的条件。