McCarroll Steven A, Kuruvilla Finny G, Korn Joshua M, Cawley Simon, Nemesh James, Wysoker Alec, Shapero Michael H, de Bakker Paul I W, Maller Julian B, Kirby Andrew, Elliott Amanda L, Parkin Melissa, Hubbell Earl, Webster Teresa, Mei Rui, Veitch James, Collins Patrick J, Handsaker Robert, Lincoln Steve, Nizzari Marcia, Blume John, Jones Keith W, Rava Rich, Daly Mark J, Gabriel Stacey B, Altshuler David
Program in Medical and Population Genetics and Genetic Analysis Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
Nat Genet. 2008 Oct;40(10):1166-74. doi: 10.1038/ng.238. Epub 2008 Sep 7.
Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.
剖析疾病风险的遗传基础需要测量所有形式的遗传变异,包括单核苷酸多态性(SNP)和拷贝数变异(CNV),而这需要借助关于它们位置、频率和群体遗传特性的精确图谱才能实现。我们设计了一种混合基因分型芯片(Affymetrix SNP 6.0),可同时在180万个基因组位点测量906,600个SNP和拷贝数。通过对270个HapMap样本进行特征分析,我们绘制了一张人类CNV图谱(断点分辨率为2 kb),该图谱由1320个拷贝数多态性(CNP)的整数基因型提供信息,这些CNP以大于1%的等位基因频率分离。先前报道的CNV区域中超过80%的序列落在我们估计的CNV边界之外,这表明大的(>100 kb)CNV对基因组的影响比最初报道的要小得多。个体对之间观察到的拷贝数差异中约80%是由于等位基因频率>5%的常见CNP引起的,超过99%源自遗传而非新突变。大多数常见的双等位基因CNP与SNP处于强连锁不平衡状态,大多数低频CNV在特定的SNP单倍型上分离。