Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands.
Eur J Hum Genet. 2013 Nov;21(11):1277-85. doi: 10.1038/ejhg.2013.48. Epub 2013 Mar 27.
Genetic variation in a population can be summarized through principal component analysis (PCA) on genome-wide data. PCs derived from such analyses are valuable for genetic association studies, where they can correct for population stratification. We investigated how to capture the genetic population structure in a well-characterized sample from the Netherlands and in a worldwide data set and examined whether (1) removing long-range linkage disequilibrium (LD) regions and LD-based SNP pruning significantly improves correlations between PCs and geography and (2) whether genetic differentiation may have been influenced by migration and/or selection. In the Netherlands, three PCs showed significant correlations with geography, distinguishing between: (1) North and South; (2) East and West; and (3) the middle-band and the rest of the country. The third PC only emerged with minimized LD, which also significantly increased correlations with geography for the other two PCs. In addition to geography, the Dutch North-South PC showed correlations with genome-wide homozygosity (r=0.245), which may reflect a serial-founder effect due to northwards migration, and also with height (♂: r=0.142, ♀: r=0.153). The divergence between subpopulations identified by PCs is partly driven by selection pressures. The first three PCs showed significant signals for diversifying selection (545 SNPs - the majority within 184 genes). The strongest signal was observed between North and South for the functional SNP in HERC2 that determines human blue/brown eye color. Thus, this study demonstrates how to increase ancestry signals in a relatively homogeneous population and how those signals can reveal evolutionary history.
人群中的遗传变异可以通过对全基因组数据进行主成分分析(PCA)来总结。从这些分析中得出的主成分对于遗传关联研究很有价值,因为它们可以纠正群体分层。我们研究了如何在一个具有良好特征的荷兰样本和一个全球数据集捕捉遗传人群结构,并检验了(1)去除长程连锁不平衡(LD)区域和基于 LD 的 SNP 修剪是否显著提高了主成分与地理之间的相关性,以及(2)遗传分化是否可能受到迁移和/或选择的影响。在荷兰,三个主成分与地理显著相关,区分了:(1)北部和南部;(2)东部和西部;以及(3)中带和该国其他地区。第三个主成分仅在最小化 LD 时出现,这也显著提高了其他两个主成分与地理的相关性。除了地理,荷兰的南北主成分还与全基因组的纯合性(r=0.245)相关,这可能反映了由于向北迁移而导致的连续创始人效应,也与身高(♂:r=0.142,♀:r=0.153)相关。主成分识别的亚群之间的差异部分是由选择压力驱动的。前三个主成分显示出多样化选择的显著信号(545 个 SNP-大多数在 184 个基因内)。在 HERC2 中的功能 SNP 之间观察到最强的信号,该 SNP 决定了人类的蓝/棕色眼睛颜色。因此,这项研究展示了如何在相对同质的人群中增加祖先信号,以及这些信号如何揭示进化历史。