Lundmark Per E, Liljedahl Ulrika, Boomsma Dorret I, Mannila Heikki, Martin Nicholas G, Palotie Aarno, Peltonen Leena, Perola Markus, Spector Tim D, Syvänen Ann-Christine
Molecular Medicine, Department of Medical Sciences, Uppsala University Hospital, Uppsala University, Uppsala, Sweden.
Eur J Hum Genet. 2008 Sep;16(9):1142-50. doi: 10.1038/ejhg.2008.77. Epub 2008 Apr 9.
We studied how well the European CEU samples used in the Haplotype Mapping Project (HapMap) represent five European populations by analyzing nuclear family samples from the Swedish, Finnish, Dutch, British and Australian (European ancestry) populations. The number of samples from each population (about 30 parent-offspring trios) was similar to that in the HapMap sample sets. A panel of 186 single nucleotide polymorphisms (SNPs) distributed over the 1.5 Mb region of the GRID2 gene on chromosome 4 was genotyped. The genotype data were compared pair-wise between the HapMap sample and the other population samples. Principal component analysis (PCA) was used to cluster the data from different populations with respect to allele frequencies and to define the markers responsible for observed variance. The only sample with detectable differences in allele frequencies was that from Kuusamo, Finland. This sample also separated from the others, including the other Finnish sample, in the PCA analysis. A set of tagSNPs was defined based on the HapMap data and applied to the samples. The tagSNPs were found to capture the genetic variation in the analyzed region at r(2)>0.8 at levels ranging from 95% in the Kuusamo sample to 87% in the Australian sample. To capture the maximal genetic variation in the region, the Kuusamo, HapMap and Australian samples required 58, 63 and 73 native tagSNPs, respectively. The HapMap CEU sample represents the European samples well for tagSNP selection, with some caution regarding estimation of allele frequencies in the Finnish Kuusamo sample, and a slight reduction in tagging efficiency in the Australian sample.
我们通过分析来自瑞典、芬兰、荷兰、英国和澳大利亚(欧洲血统)人群的核心家庭样本,研究了单倍型图谱计划(HapMap)中使用的欧洲CEU样本对五个欧洲人群的代表性如何。每个群体的样本数量(约30个亲子三联体)与HapMap样本集中的数量相似。对分布在4号染色体上GRID2基因1.5 Mb区域的186个单核苷酸多态性(SNP)组成的面板进行了基因分型。将HapMap样本与其他群体样本的基因型数据进行两两比较。主成分分析(PCA)用于根据等位基因频率对来自不同群体的数据进行聚类,并确定导致观察到的变异的标记。唯一在等位基因频率上有可检测差异的样本是来自芬兰库萨莫的样本。在PCA分析中,该样本也与其他样本(包括其他芬兰样本)分离。基于HapMap数据定义了一组标签SNP,并应用于这些样本。发现这些标签SNP在r(2)>0.8时能够捕获分析区域的遗传变异,捕获水平从库萨莫样本中的95%到澳大利亚样本中的87%不等。为了捕获该区域的最大遗传变异,库萨莫、HapMap和澳大利亚样本分别需要58、63和73个本地标签SNP。对于标签SNP选择而言,HapMap CEU样本能很好地代表欧洲样本,但在估计芬兰库萨莫样本的等位基因频率时需谨慎,且澳大利亚样本的标签效率略有降低。