Qin Pengfei, Li Zhiqiang, Jin Wenfei, Lu Dongsheng, Lou Haiyi, Shen Jiawei, Jin Li, Shi Yongyong, Xu Shuhua
1] Max Planck Independent Research Group on Population Genomics, Chinese Academy of Sciences and Max Planck Society Partner Institute for Computational Biology, Shanghai Institute for Biological Sciences, Shanghai, China [2] Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences and Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.
Shanghai Genome Pilot Institutes for Genomics and Human Health, Shanghai, China.
Eur J Hum Genet. 2014 Feb;22(2):248-53. doi: 10.1038/ejhg.2013.111. Epub 2013 May 29.
Population stratification acts as a confounding factor in genetic association studies and may lead to false-positive or false-negative results. Previous studies have analyzed the genetic substructures in Han Chinese population, the largest ethnic group in the world comprising ∼20% of the global human population. In this study, we examined 5540 Han Chinese individuals with about 1 million single-nucleotide polymorphisms (SNPs) and screened a panel of ancestry informative markers (AIMs) to facilitate the discerning and controlling of population structure in future association studies on Han Chinese. Based on genome-wide data, we first confirmed our previous observation of the north-south differentiation in Han Chinese population. Second, we developed a panel of 150 validated SNP AIMs to determine the northern or southern origin of each Han Chinese individual. We further evaluated the performance of our AIMs panel in association studies in simulation analysis. Our results showed that this AIMs panel had sufficient power to discern and control population stratification in Han Chinese, which could significantly reduce false-positive rates in both genome-wide association studies (GWAS) and candidate gene association studies (CGAS). We suggest this AIMs panel be genotyped and used to control and correct population stratification in the study design or data analysis of future association studies, especially in CGAS which is the most popular approach to validate previous reports on genetic associations of diseases in post-GWAS era.
群体分层在基因关联研究中是一个混杂因素,可能导致假阳性或假阴性结果。以往的研究分析了汉族人群的遗传亚结构,汉族是世界上最大的民族,约占全球人口的20%。在本研究中,我们检测了5540名汉族个体,使用了约100万个单核苷酸多态性(SNP),并筛选了一组祖先信息标记(AIM),以便在未来针对汉族人群的关联研究中识别和控制群体结构。基于全基因组数据,我们首先证实了之前观察到的汉族人群南北分化现象。其次,我们开发了一组由150个经过验证的SNP AIM组成的面板,以确定每个汉族个体的北方或南方起源。我们在模拟分析中进一步评估了我们的AIM面板在关联研究中的性能。我们的结果表明,这个AIM面板有足够的能力识别和控制汉族人群中的群体分层,这可以显著降低全基因组关联研究(GWAS)和候选基因关联研究(CGAS)中的假阳性率。我们建议在未来关联研究的设计或数据分析中,对这个AIM面板进行基因分型,并用于控制和校正群体分层,特别是在CGAS中,这是后GWAS时代验证先前疾病遗传关联报告的最常用方法。