Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America.
PLoS One. 2010 Oct 13;5(10):e13336. doi: 10.1371/journal.pone.0013336.
The population of Costa Rica (CR) represents an admixture of major continental populations. An investigation of the CR population structure would provide an important foundation for mapping genetic variants underlying common diseases and traits. We conducted an analysis of 1,301 women from the Guanacaste region of CR using 27,904 single nucleotide polymorphisms (SNPs) genotyped on a custom Illumina InfiniumII iSelect chip. The program STRUCTURE was used to compare the CR Guanacaste sample with four continental reference samples, including HapMap Europeans (CEU), East Asians (JPT+CHB), West African Yoruba (YRI), as well as Native Americans (NA) from the Illumina iControl database. Our results show that the CR Guanacaste sample comprises a three-way admixture estimated to be 43% European, 38% Native American and 15% West African. An estimated 4% residual Asian ancestry may be within the error range. Results from principal components analysis reveal a correlation between genetic and geographic distance. The magnitude of linkage disequilibrium (LD) measured by the number of tagging SNPs required to cover the same region in the genome in the CR Guanacaste sample appeared to be weaker than that observed in CEU, JPT+CHB and NA reference samples but stronger than that of the HapMap YRI sample. Based on the clustering pattern observed in both STRUCTURE and principal components analysis, two subpopulations were identified that differ by approximately 20% in LD block size averaged over all LD blocks identified by Haploview. We also show in a simulated association study conducted within the two subpopulations, that the failure to account for population stratification (PS) could lead to a noticeable inflation in the false positive rate. However, we further demonstrate that existing PS adjustment approaches can reduce the inflation to an acceptable level for gene discovery.
哥斯达黎加(CR)的人口代表了主要大陆人口的混合。对 CR 人口结构的研究将为绘制常见疾病和特征的遗传变异提供重要基础。我们使用 Illumina InfiniumII iSelect 芯片对来自哥斯达黎加瓜纳卡斯特地区的 1301 名女性进行了分析,共检测了 27904 个单核苷酸多态性(SNP)。使用 STRUCTURE 程序将 CR 瓜纳卡斯特样本与四个大陆参考样本进行比较,包括 HapMap 欧洲人(CEU)、东亚人(JPT+CHB)、西非约鲁巴人(YRI)以及 Illumina iControl 数据库中的美洲原住民(NA)。我们的结果表明,CR 瓜纳卡斯特样本由三方混合而成,估计有 43%的欧洲人、38%的美洲原住民和 15%的西非人。估计有 4%的亚洲剩余祖先可能在误差范围内。主成分分析的结果显示遗传和地理距离之间存在相关性。在所研究的 CR 瓜纳卡斯特样本中,衡量基因组中相同区域所需的标记 SNP 数量来估计连锁不平衡(LD)的大小似乎比在 CEU、JPT+CHB 和 NA 参考样本中观察到的要弱,但比 HapMap YRI 样本要强。基于 STRUCTURE 和主成分分析中观察到的聚类模式,我们确定了两个亚群,它们在所有由 Haploview 确定的 LD 块中,LD 块大小的平均值相差约 20%。我们还在两个亚群内进行的模拟关联研究中表明,如果不考虑群体分层(PS),可能会导致假阳性率明显膨胀。然而,我们进一步证明,现有的 PS 调整方法可以将膨胀降低到可接受的基因发现水平。