Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA.
Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA.
Mol Biol Evol. 2018 Nov 1;35(11):2736-2750. doi: 10.1093/molbev/msy170.
As are most non-European populations, the Han Chinese are relatively understudied in population and medical genetics studies. From low-coverage whole-genome sequencing of 11,670 Han Chinese women we present a catalog of 25,057,223 variants, including 548,401 novel variants that are seen at least 10 times in our data set. Individuals from this data set came from 24 out of 33 administrative divisions across China (including 19 provinces, 4 municipalities, and 1 autonomous region), thus allowing us to study population structure, genetic ancestry, and local adaptation in Han Chinese. We identified previously unrecognized population structure along the East-West axis of China, demonstrated a general pattern of isolation-by-distance among Han Chinese, and reported unique regional signals of admixture, such as European influences among the Northwestern provinces of China. Furthermore, we identified a number of highly differentiated, putatively adaptive, loci (e.g., MTHFR, ADH7, and FADS, among others) that may be driven by immune response, climate, and diet in the Han Chinese. Finally, we have made available allele frequency estimates stratified by administrative divisions across China in the Geography of Genetic Variant browser for the broader community. By leveraging the largest currently available genetic data set for Han Chinese, we have gained insights into the history and population structure of the world's largest ethnic group.
与大多数非欧洲人群一样,汉族在人口和医学遗传学研究中相对研究较少。我们对 11670 名汉族女性进行了低覆盖率全基因组测序,从中我们提供了一个包含 25057223 个变体的目录,其中包括在我们的数据集中至少出现 10 次的 548401 个新变体。该数据集的个体来自中国 33 个行政区划中的 24 个(包括 19 个省、4 个直辖市和 1 个自治区),因此我们可以研究汉族的人口结构、遗传渊源和局部适应。我们沿着中国的东西轴线发现了以前未被识别的人口结构,证明了汉族之间普遍存在隔离距离的模式,并报告了独特的混合区域信号,例如中国西北省份的欧洲影响。此外,我们还确定了一些高度分化的、可能具有适应性的基因座(例如 MTHFR、ADH7 和 FADS 等),这些基因座可能受到汉族免疫反应、气候和饮食的驱动。最后,我们在地理遗传变异浏览器中按中国各行政区划提供了分层的等位基因频率估计值,供更广泛的社区使用。通过利用目前最大的汉族遗传数据集,我们深入了解了世界上最大的族群的历史和人口结构。