Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
Eur J Hum Genet. 2010 Apr;18(4):471-8. doi: 10.1038/ejhg.2009.184. Epub 2009 Oct 21.
Analyzing genetic variation of human populations for detecting loci that have been affected by positive natural selection is important for understanding adaptive history and phenotypic variation in humans. In this study, we analyzed recent positive selection in Northern Europe from genome-wide data sets of 250 000 and 500 000 single-nucleotide polymorphisms (SNPs) in a total of 999 individuals from Great Britain, Northern Germany, Eastern and Western Finland, and Sweden. Coalescent simulations were used for demonstrating that the integrated haplotype score (iHS) and long-range haplotype (LRH) statistics have sufficient power in genome-wide data sets of different sample sizes and SNP densities. Furthermore, the behavior of the F(ST) statistic in closely related populations was characterized by allele frequency simulations. In the analysis of the North European data set, 60 regions in the genome showed strong signs of recent positive selection. Out of these, 21 regions have not been discovered in previous scans, and many contain genes with interesting functions (eg, RAB38, INFG, NOS1AP, and APOE). In the putatively selected regions, we observed a statistically significant overrepresentation of genetic association with complex disease, which emphasizes the importance of the analysis of positive selection in understanding the evolution of human disease. Altogether, this study demonstrates the potential of genome-wide data sets to discover loci that lie behind evolutionary adaptation in different human populations.
分析人类群体的遗传变异以检测受到正自然选择影响的基因座对于了解人类的适应性历史和表型变异非常重要。在这项研究中,我们分析了来自英国、德国北部、芬兰东部和西部以及瑞典的 999 个人的全基因组数据集(总计 250000 个和 500000 个单核苷酸多态性(SNP))中北欧的近期正选择。连锁模拟用于证明整合单倍型评分(iHS)和长程单倍型(LRH)统计量在不同样本大小和 SNP 密度的全基因组数据集中有足够的功效。此外,通过等位基因频率模拟来描述近亲群体中 F(ST)统计量的行为。在对北欧数据集的分析中,基因组中有 60 个区域显示出近期正选择的强烈迹象。其中 21 个区域在以前的扫描中没有发现,许多区域包含具有有趣功能的基因(例如,RAB38、INFG、NOS1AP 和 APOE)。在假定的选择区域中,我们观察到与复杂疾病的遗传关联具有统计学意义的过度表达,这强调了在理解人类疾病进化时分析正选择的重要性。总之,这项研究证明了全基因组数据集在发现不同人类群体进化适应背后的基因座方面的潜力。