Department of Biology, Indiana University, Bloomington, IN, USA.
Mol Biol Evol. 2010 Jan;27(1):103-11. doi: 10.1093/molbev/msp210.
Copy number variants (CNVs) within humans can have both adaptive and deleterious effects. Because of their phenotypic significance, researchers have attempted to find single nucleotide polymorphisms (SNPs) in high linkage disequilibrium (LD) with CNVs to use in genomewide association studies. However, studies have found that CNVs are less likely to be in strong LD with flanking markers. We hypothesized that this "taggability gap" can be explained by duplication events that place paralogous sequences far apart. In support of our hypothesis, we find that duplications are significantly less likely than deletions to have a "tag" SNP, even after controlling for CNV length, allele frequency, and availability of appropriate flanking SNPs. Using a novel likelihood method, we are able to show that many complex CNVs--those due to multiple duplication or deletion polymorphisms--are made up of two loci with little LD between them. Additionally, we find that many polymorphic duplications detected in a recent clone-based study are located far from their parental loci. We also examine two other common hypotheses for the taggability gap, and find that recurrent mutation of both deletions and duplications appears to have an effect on LD, but that lower SNP density around CNVs has no effect. Overall, our results suggest that a substantial fraction of CNVs caused by duplication cannot be tagged by markers flanking the parental locus because they have changed genomic location.
人类的拷贝数变异(CNVs)可能具有适应性和有害性影响。由于其表型意义,研究人员试图在高连锁不平衡(LD)中找到与 CNVs 紧密相关的单核苷酸多态性(SNP),以便用于全基因组关联研究。然而,研究发现 CNVs 与侧翼标记的强 LD 不太可能。我们假设这种“可标记性差距”可以通过将旁系同源序列隔开的重复事件来解释。支持我们的假设,我们发现,即使在控制 CNV 长度、等位基因频率和适当侧翼 SNP 的可用性后,重复事件比缺失事件更不可能具有“标记”SNP。使用一种新颖的似然方法,我们能够表明许多复杂的 CNVs——那些由于多个重复或缺失多态性引起的——由两个位点组成,它们之间很少有 LD。此外,我们发现最近在基于克隆的研究中检测到的许多多态性重复都远离其亲本位点。我们还研究了“可标记性差距”的另外两个常见假设,发现缺失和重复的反复突变似乎对 LD 有影响,但 CNV 周围 SNP 密度较低没有影响。总体而言,我们的研究结果表明,由于重复而导致的大量 CNVs 不能通过侧翼亲本位点的标记来标记,因为它们已经改变了基因组位置。