Itsara Andy, Cooper Gregory M, Baker Carl, Girirajan Santhosh, Li Jun, Absher Devin, Krauss Ronald M, Myers Richard M, Ridker Paul M, Chasman Daniel I, Mefford Heather, Ying Phyllis, Nickerson Deborah A, Eichler Evan E
Department of Genome Sciences, School of Medicine, University of Washington, Seattle, WA 98195, USA.
Am J Hum Genet. 2009 Feb;84(2):148-61. doi: 10.1016/j.ajhg.2008.12.014. Epub 2009 Jan 22.
Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in approximately 2500 individuals by using Illumina SNP data, with an emphasis on "hotspots" prone to recurrent mutations. We find variants larger than 500 kb in 5%-10% of individuals and variants greater than 1 Mb in 1%-2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%-1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.
拷贝数变异(CNV)对人类遗传和表型多样性有贡献。然而,一般人群中较大CNV的分布在很大程度上仍未被探索。我们通过使用Illumina SNP数据在约2500名个体中鉴定大变异,重点关注易于发生反复突变的“热点”。我们发现5%-10%的个体存在大于500 kb的变异,1%-2%的个体存在大于1 Mb的变异。与先前的研究不同,我们发现地理上不同的人类群体中CNV分层的证据有限。重要的是,我们的样本量允许对真正罕见的和多态但低频的拷贝数变异进行有力区分。我们发现大于100 kb的个体CNV中有很大一部分是罕见的,并且基因密度和大小都与等位基因频率强烈负相关。因此,尽管大CNV通常存在于正常个体中,这表明不能仅以大小作为致病性的预测指标,但这种变异通常是有害的。考虑到这些观察结果,我们将我们的数据与来自超过12,000名对照和神经疾病患者的已发表CNV数据相结合。该分析确定了已知的疾病位点,并突出了其他需要进一步研究的CNV(例如,3q29、16p12和15q25.2)。这项研究首次对一般人群中大型、罕见(0.1%-1%)的CNV进行了分析,为未来的遗传疾病分析提供了相关见解。