State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China.
Max Planck Independent Research Group on Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai, China.
J Med Genet. 2017 Oct;54(10):685-692. doi: 10.1136/jmedgenet-2017-104613. Epub 2017 Jul 13.
Copy number variation (CNV) is a valuable source of genetic diversity in the human genome and a well-recognised cause of various genetic diseases. However, CNVs have been considerably under-represented in population-based studies, particularly the Han Chinese which is the largest ethnic group in the world.
To build a representative CNV map for the Han Chinese population.
We conducted a genome-wide CNV study involving 451 male Han Chinese samples from 11 geographical regions encompassing 28 dialect groups, representing a less-biased panel compared with the currently available data. We detected CNVs by using 4.2M NimbleGen comparative genomic hybridisation array and whole-genome deep sequencing of 51 samples to optimise the filtering conditions in CNV discovery.
A comprehensive Han Chinese CNV map was built based on a set of high-quality variants (positive predictive value >0.8, with sizes ranging from 369 bp to 4.16 Mb and a median of 5907 bp). The map consists of 4012 CNV regions (CNVRs), and more than half are novel to the 30 East Asian CNV Project and the 1000 Genomes Project Phase 3. We further identified 81 CNVRs specific to regional groups, which was indicative of the subpopulation structure within the Han Chinese population.
Our data are complementary to public data sources, and the CNV map may facilitate in the identification of pathogenic CNVs and further biomedical research studies involving the Han Chinese population.
拷贝数变异(CNV)是人类基因组中遗传多样性的一个有价值的来源,也是多种遗传疾病的公认原因。然而,在基于人群的研究中,CNV 的代表性严重不足,尤其是在世界上最大的族群汉族中。
构建汉族人群的代表性 CNV 图谱。
我们进行了一项全基因组 CNV 研究,涉及来自 11 个地理区域的 451 名男性汉族样本,涵盖了 28 个方言群体,与目前可用的数据相比,这是一个代表性更强的样本。我们使用 4.2M NimbleGen 比较基因组杂交阵列和 51 个样本的全基因组深度测序来检测 CNV,以优化 CNV 发现中的过滤条件。
基于一组高质量的变异体(阳性预测值>0.8,大小从 369bp 到 4.16Mb,中位数为 5907bp),构建了一个全面的汉族 CNV 图谱。图谱由 4012 个 CNV 区域(CNVRs)组成,其中超过一半是 30 个东亚 CNV 项目和 1000 基因组项目第 3 阶段中没有的新 CNVRs。我们进一步鉴定了 81 个特定于区域群体的 CNVR,这表明汉族人群内部存在亚群结构。
我们的数据与公共数据源互补,CNV 图谱可能有助于鉴定致病性 CNV,并进一步促进涉及汉族人群的生物医学研究。