Cao Zong-Fu, Ma Chuan-Xiang, Wang Lei, Cai Bin
1. National Engineering Research Center for Beijing Biochip Technology, Beijing 102206, China; 2. CapitalBio Corporation, Beijing 102206, China.
Yi Chuan. 2010 Sep;32(9):921-8.
Since population genetic STRUCTURE can increase false-positive rate in genome-wide association studies (GWAS) for complex diseases, the effect of population stratification should be taken into account in GWAS. However, the effect of randomly selected SNPs in population stratification analysis is underdetermined. In this study, based on the genotype data generated on Genome-Wide Human SNP Array 6.0 from unrelated individuals of HapMap Phase2, we randomly selected SNPs that were evenly distributed across the whole-genome, and acquired Ancestry Informative Markers (AIMs) by the method of f value and allelic Fisher exact test. F-statistics and STRUCTURE analysis based on the select different sets of SNPs were used to evaluate the effect of distinguishing the populations from HapMap Phase3. We found that randomly selected SNPs that were evenly distributed across the whole-genome were able to be used to identify the population structure. This study further indicated that more than 3 000 randomly selected SNPs that were evenly distributed across the whole-genome were substituted for AIMs in population stratification analysis, when there were no available AIMs for spe-cific populations.
由于群体遗传结构会增加复杂疾病全基因组关联研究(GWAS)中的假阳性率,因此在GWAS中应考虑群体分层的影响。然而,群体分层分析中随机选择的单核苷酸多态性(SNP)的作用尚未确定。在本研究中,基于HapMap二期无关个体的全基因组人类SNP 6.0芯片产生的基因型数据,我们随机选择了全基因组均匀分布的SNP,并通过f值法和等位基因Fisher精确检验获得祖先信息标记(AIM)。基于所选不同SNP集的F统计量和STRUCTURE分析用于评估区分HapMap三期群体的效果。我们发现,全基因组均匀分布的随机选择的SNP能够用于识别群体结构。本研究进一步表明,当特定群体没有可用的AIM时,可以用全基因组均匀分布的3000多个随机选择的SNP替代AIM进行群体分层分析。