Human Genetics, Genome Institute of Singapore, Singapore 138672, Singapore.
Am J Hum Genet. 2009 Dec;85(6):775-85. doi: 10.1016/j.ajhg.2009.10.016.
Population stratification is a potential problem for genome-wide association studies (GWAS), confounding results and causing spurious associations. Hence, understanding how allele frequencies vary across geographic regions or among subpopulations is an important prelude to analyzing GWAS data. Using over 350,000 genome-wide autosomal SNPs in over 6000 Han Chinese samples from ten provinces of China, our study revealed a one-dimensional "north-south" population structure and a close correlation between geography and the genetic structure of the Han Chinese. The north-south population structure is consistent with the historical migration pattern of the Han Chinese population. Metropolitan cities in China were, however, more diffused "outliers," probably because of the impact of modern migration of peoples. At a very local scale within the Guangdong province, we observed evidence of population structure among dialect groups, probably on account of endogamy within these dialects. Via simulation, we show that empirical levels of population structure observed across modern China can cause spurious associations in GWAS if not properly handled. In the Han Chinese, geographic matching is a good proxy for genetic matching, particularly in validation and candidate-gene studies in which population stratification cannot be directly accessed and accounted for because of the lack of genome-wide data, with the exception of the metropolitan cities, where geographical location is no longer a good indicator of ancestral origin. Our findings are important for designing GWAS in the Chinese population, an activity that is expected to intensify greatly in the near future.
群体分层是全基因组关联研究(GWAS)的一个潜在问题,它会干扰结果并导致虚假关联。因此,了解等位基因频率在地理区域或亚群之间的变化方式,是分析 GWAS 数据的重要前提。本研究使用了来自中国十个省份的 6000 多个汉族样本中的超过 35 万个全基因组常染色体 SNPs,揭示了汉族人群存在一维的“南北”人口结构,以及地理因素与汉族人群遗传结构之间的密切相关性。南北人口结构与汉族人口的历史迁移模式一致。然而,中国的大城市是更为分散的“异常值”,这可能是由于现代人口迁移的影响。在广东省内非常局部的范围内,我们观察到方言群体之间存在人口结构的证据,这可能是由于这些方言中的同宗婚姻所致。通过模拟,我们表明,如果不进行适当处理,在中国现代社会中观察到的群体结构的实际水平可能会导致 GWAS 中的虚假关联。在中国汉族人群中,地理匹配是遗传匹配的良好替代指标,特别是在验证和候选基因研究中,由于缺乏全基因组数据,无法直接访问和解释群体分层,除了大城市,在这些城市中,地理位置不再是祖先起源的良好指标。我们的发现对于在中国人群中进行 GWAS 设计非常重要,预计在不久的将来,这种活动将会大大增加。