Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China.
PLoS One. 2011;6(11):e27341. doi: 10.1371/journal.pone.0027341. Epub 2011 Nov 7.
It has been shown that the human genome contains extensive copy number variations (CNVs). Investigating the medical and evolutionary impacts of CNVs requires the knowledge of locations, sizes and frequency distribution of them within and between populations. However, CNV study of Chinese minorities, which harbor the majority of genetic diversity of Chinese populations, has been underrepresented considering the same efforts in other populations. Here we constructed, to our knowledge, a first CNV map in seven Chinese populations representing the major linguistic groups in China with 1,440 CNV regions identified using Affymetrix SNP 6.0 Array. Considerable differences in distributions of CNV regions between populations and substantial population structures were observed. We showed that ∼35% of CNV regions identified in minority ethnic groups are not shared by Han Chinese population, indicating that the contribution of the minorities to genetic architecture of Chinese population could not be ignored. We further identified highly differentiated CNV regions between populations. For example, a common deletion in Dong and Zhuang (44.4% and 50%), which overlaps two keratin-associated protein genes contributing to the structure of hair fibers, was not observed in Han Chinese. Interestingly, the most differentiated CNV deletion between HapMap CEU and YRI containing CCL3L1 gene reported in previous studies was also the highest differentiated regions between Tibetan and other populations. Besides, by jointly analyzing CNVs and SNPs, we found a CNV region containing gene CTDSPL were in almost perfect linkage disequilibrium between flanking SNPs in Tibetan while not in other populations except HapMap CHD. Furthermore, we found the SNP taggability of CNVs in Chinese populations was much lower than that in European populations. Our results suggest the necessity of a full characterization of CNVs in Chinese populations, and the CNV map we constructed serves as a useful resource in further evolutionary and medical studies.
已证实人类基因组中存在广泛的拷贝数变异(CNVs)。研究 CNVs 对医学和进化的影响需要了解其在人群内和人群间的位置、大小和频率分布。然而,考虑到其他人群所做的相同努力,对中国少数民族(拥有中国人口的大部分遗传多样性)的 CNV 研究却一直不足。在此,我们构建了中国七个主要语言群体的第一个 CNV 图谱,使用 Affymetrix SNP 6.0 阵列鉴定了 1440 个 CNV 区域。观察到人群之间的 CNV 区域分布存在显著差异和较大的人群结构。我们表明,少数民族群体中鉴定出的约 35%的 CNV 区域不被汉族人群共享,这表明少数民族对中国人口遗传结构的贡献不容忽视。我们进一步鉴定了人群之间高度分化的 CNV 区域。例如,在汉族中未观察到 Dong 和 Zhuang 群体(44.4%和 50%)共有的常见缺失,该缺失重叠了两个角蛋白相关蛋白基因,对毛发纤维的结构有贡献。有趣的是,之前研究报道的 HapMap CEU 和 YRI 之间差异最大的 CNV 缺失,也存在于藏人和其他人群之间。此外,通过联合分析 CNVs 和 SNPs,我们发现一个包含 CTDSPL 基因的 CNV 区域,在藏族人群中,侧翼 SNPs 之间几乎完全处于连锁不平衡状态,而在其他人群中除了 HapMap CHD 之外均不存在。此外,我们发现中国人群中 CNVs 的 SNP 可标记性远低于欧洲人群。我们的研究结果表明,有必要对中国人群中的 CNVs 进行全面表征,我们构建的 CNV 图谱可作为进一步进化和医学研究的有用资源。