Locke Devin P, Sharp Andrew J, McCarroll Steven A, McGrath Sean D, Newman Tera L, Cheng Ze, Schwartz Stuart, Albertson Donna G, Pinkel Daniel, Altshuler David M, Eichler Evan E
Department of Genome Sciences, University of Washington and Howard Hughes Medical Institute, Seattle, WA 98195, USA.
Am J Hum Genet. 2006 Aug;79(2):275-90. doi: 10.1086/505653. Epub 2006 Jun 15.
Studies of copy-number variation and linkage disequilibrium (LD) have typically excluded complex regions of the genome that are rich in duplications and prone to rearrangement. In an attempt to assess the heritability and LD of copy-number polymorphisms (CNPs) in duplication-rich regions of the genome, we profiled copy-number variation in 130 putative "rearrangement hotspot regions" among 269 individuals of European, Yoruba, Chinese, and Japanese ancestry analyzed by the International HapMap Consortium. Eighty-four hotspot regions, corresponding to 257 bacterial artificial chromosome (BAC) probes, showed evidence of copy-number differences. Despite a predisposing genetic architecture, no polymorphism was ever observed in the remaining 46 "rearrangement hotspots," and we suggest these represent excellent candidate sites for pathogenic rearrangements. We used a combination of BAC-based and high-density customized oligonucleotide arrays to resolve the molecular basis of structural rearrangements. For common variants (frequency >10%), we observed a distinct bias against copy-number losses, suggesting that deletions are subject to purifying selection. Heritability estimates did not differ significantly from 1.0 among the majority (30 of 34) of loci analyzed, consistent with normal Mendelian inheritance. Some of the CNPs in duplication-rich regions showed strong LD with nearby single-nucleotide polymorphisms (SNPs) and were observed to segregate on ancestral SNP haplotypes. However, LD with the best available SNP markers was weaker than has been reported for deletion polymorphisms in less complex regions of the genome. These observations may be accounted for by a low density of SNP data in duplicated regions, challenges in mapping and typing the CNPs, and the possibility that CNPs in these regions have rearranged on multiple haplotype backgrounds. Our results underscore the need for complete maps of genetic variation in duplication-rich regions of the genome.
对拷贝数变异和连锁不平衡(LD)的研究通常排除了基因组中富含重复序列且易于重排的复杂区域。为了评估基因组中富含重复序列区域的拷贝数多态性(CNP)的遗传力和LD,我们对国际人类基因组单体型图协会分析的269名欧洲、约鲁巴、中国和日本血统个体中的130个假定“重排热点区域”的拷贝数变异进行了分析。84个热点区域,对应257个细菌人工染色体(BAC)探针,显示出拷贝数差异的证据。尽管存在易导致重排的遗传结构,但在其余46个“重排热点”中从未观察到多态性,我们认为这些是致病性重排的极佳候选位点。我们结合使用基于BAC的和高密度定制寡核苷酸阵列来解析结构重排的分子基础。对于常见变异(频率>10%),我们观察到对拷贝数缺失存在明显的偏向性,这表明缺失受到纯化选择。在所分析的大多数(34个中的30个)位点中,遗传力估计值与1.0没有显著差异,这与正常的孟德尔遗传一致。富含重复序列区域的一些CNP与附近的单核苷酸多态性(SNP)表现出强LD,并且观察到它们在祖先SNP单倍型上分离。然而,与基因组较简单区域中缺失多态性的报道相比,与现有最佳SNP标记的LD较弱。这些观察结果可能是由于重复区域中SNP数据密度低、CNP的定位和分型存在挑战,以及这些区域中的CNP可能在多个单倍型背景上发生了重排。我们的结果强调了基因组中富含重复序列区域完整遗传变异图谱的必要性。