Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA.
Am J Hum Genet. 2011 Mar 11;88(3):317-32. doi: 10.1016/j.ajhg.2011.02.004.
Copy-number variants (CNVs) can reach appreciable frequencies in the human population, and recent discoveries have shown that several of these copy-number polymorphisms (CNPs) are associated with human diseases, including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. We developed a method based on single-channel intensity data and benchmarked against copy numbers determined from sequencing read depth to successfully obtain CNP genotypes for 1495 CNPs from 487 human DNA samples of diverse ethnic backgrounds. This microarray contained CNPs in segmental duplication-rich regions and insertions of sequences not represented in the reference genome assembly or on standard SNP microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that biallelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although biallelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multicopy CNPs do not (40% with r > 0.8). We selected a subset of CNPs for further characterization in 1876 additional samples from 62 populations; this revealed striking population-differentiated structural variants in genes of clinical significance such as OCLN, a tight junction protein involved in hepatitis C viral entry. Our microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that cannot be imputed from SNP genotypes) might have contributed disproportionately to human diversity and selection.
拷贝数变异(CNVs)在人类群体中可以达到相当高的频率,最近的发现表明,这些拷贝数多态性(CNPs)中的几个与人类疾病有关,包括狼疮、银屑病、克罗恩病和肥胖症。尽管有新的进展,但在 CNP 的发现和基因分型方面仍然存在显著的偏差。我们开发了一种基于单通道强度数据的方法,并与基于测序读取深度确定的拷贝数进行基准测试,成功地为 487 个来自不同种族背景的人类 DNA 样本中的 1495 个 CNP 获得了 CNP 基因型。该微阵列包含在片段重复丰富区域中的 CNP 以及在参考基因组组装或标准 SNP 微阵列平台上未表示的序列插入。我们观察到,与独特区域中的 CNP 相比,片段重复中的 CNP 更有可能具有种群分化(p = 0.015),并且与频率匹配的 SNP 相比,双等位基因 CNP 显示出更大的分层(p = 0.0026)。尽管双等位基因 CNP 与侧翼 SNP 基因型的拷贝数具有很强的相关性,但大多数多拷贝 CNP 则不然(40%的 r > 0.8)。我们选择了一组 CNP 进行进一步的特征描述,在来自 62 个群体的 1876 个额外样本中进行了研究;这揭示了在 OCLN 等具有临床意义的基因中存在引人注目的种群分化的结构变体,OCLN 是一种参与丙型肝炎病毒进入的紧密连接蛋白。我们的微阵列设计允许快速测试这些变体与疾病的关联,我们的结果表明,CNPs(特别是那些不能从 SNP 基因型推断的)可能不成比例地为人类多样性和选择做出了贡献。