Fedko Iryna O, Hottenga Jouke-Jan, Medina-Gomez Carolina, Pappa Irene, van Beijsterveldt Catharina E M, Ehli Erik A, Davies Gareth E, Rivadeneira Fernando, Tiemeier Henning, Swertz Morris A, Middeldorp Christel M, Bartels Meike, Boomsma Dorret I
Department of Biological Psychology, VU University Amsterdam, Van der Boechorststraat 1, 1081BT, Amsterdam, The Netherlands,
Behav Genet. 2015 Sep;45(5):514-28. doi: 10.1007/s10519-015-9725-7. Epub 2015 Jun 3.
Combining genotype data across cohorts increases power to estimate the heritability due to common single nucleotide polymorphisms (SNPs), based on analyzing a Genetic Relationship Matrix (GRM). However, the combination of SNP data across multiple cohorts may lead to stratification, when for example, different genotyping platforms are used. In the current study, we address issues of combining SNP data from different cohorts, the Netherlands Twin Register (NTR) and the Generation R (GENR) study. Both cohorts include children of Northern European Dutch background (N = 3102 + 2826, respectively) who were genotyped on different platforms. We explore imputation and phasing as a tool and compare three GRM-building strategies, when data from two cohorts are (1) just combined, (2) pre-combined and cross-platform imputed and (3) cross-platform imputed and post-combined. We test these three strategies with data on childhood height for unrelated individuals (N = 3124, average age 6.7 years) to explore their effect on SNP-heritability estimates and compare results to those obtained from the independent studies. All combination strategies result in SNP-heritability estimates with a standard error smaller than those of the independent studies. We did not observe significant difference in estimates of SNP-heritability based on various cross-platform imputed GRMs. SNP-heritability of childhood height was on average estimated as 0.50 (SE = 0.10). Introducing cohort as a covariate resulted in ≈2 % drop. Principal components (PCs) adjustment resulted in SNP-heritability estimates of about 0.39 (SE = 0.11). Strikingly, we did not find significant difference between cross-platform imputed and combined GRMs. All estimates were significant regardless the use of PCs adjustment. Based on these analyses we conclude that imputation with a reference set helps to increase power to estimate SNP-heritability by combining cohorts of the same ethnicity genotyped on different platforms. However, important factors should be taken into account such as remaining cohort stratification after imputation and/or phenotypic heterogeneity between and within cohorts. Whether one should use imputation, or just combine the genotype data, depends on the number of overlapping SNPs in relation to the total number of genotyped SNPs for both cohorts, and their ability to tag all the genetic variance related to the specific trait of interest.
通过分析遗传关系矩阵(GRM),整合不同队列的基因型数据能够提高估计常见单核苷酸多态性(SNP)遗传力的效能。然而,跨多个队列整合SNP数据可能会导致分层现象,例如在使用不同基因分型平台时。在本研究中,我们探讨了整合来自荷兰双胞胎登记库(NTR)和Generation R(GENR)研究这两个不同队列的SNP数据所涉及的问题。这两个队列均包含具有北欧荷兰背景的儿童(分别为N = 3102 + 2826),他们在不同平台上进行了基因分型。我们将插补和定相作为一种工具进行探索,并比较三种构建GRM的策略,即当两个队列的数据(1)直接合并、(2)预合并并进行跨平台插补以及(3)跨平台插补并后合并时的情况。我们使用无关个体(N = 3124,平均年龄6.7岁)的儿童身高数据对这三种策略进行测试,以探究它们对SNP遗传力估计的影响,并将结果与独立研究所得结果进行比较。所有合并策略所得的SNP遗传力估计值的标准误差均小于独立研究的结果。基于各种跨平台插补的GRM,我们未观察到SNP遗传力估计值存在显著差异。儿童身高的SNP遗传力平均估计为0.50(SE = 0.10)。将队列作为协变量纳入会导致约2%的下降。主成分(PC)调整后的SNP遗传力估计值约为0.39(SE = 0.11)。值得注意的是,我们未发现跨平台插补和合并的GRM之间存在显著差异。无论是否使用PC调整,所有估计值均具有显著性。基于这些分析,我们得出结论,使用参考集进行插补有助于通过整合在不同平台上进行基因分型的同种族队列来提高估计SNP遗传力的效能。然而,还应考虑一些重要因素,如插补后剩余的队列分层以及队列之间和内部的表型异质性。是否应使用插补或仅合并基因型数据,取决于两个队列中重叠SNP的数量与基因分型SNP总数的关系,以及它们标记与特定感兴趣性状相关的所有遗传变异的能力。