Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia.
Division of Psychiatry, University of Edinburgh, Edinburgh, UK.
Am J Hum Genet. 2023 Jul 6;110(7):1207-1215. doi: 10.1016/j.ajhg.2023.06.006. Epub 2023 Jun 27.
In polygenic score (PGS) analysis, the coefficient of determination (R) is a key statistic to evaluate efficacy. R is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic effect sizes. The SNP-based heritability (h, the proportion of total phenotypic variances attributable to all common SNPs) is the theoretical upper limit of the out-of-sample prediction R. However, in real data analyses R has been reported to exceed h, which occurs in parallel with the observation that h estimates tend to decline as the number of cohorts being meta-analyzed increases. Here, we quantify why and when these observations are expected. Using theory and simulation, we show that if heterogeneities in cohort-specific h exist, or if genetic correlations between cohorts are less than one, h estimates can decrease as the number of cohorts being meta-analyzed increases. We derive conditions when the out-of-sample prediction R will be greater than h and show the validity of our derivations with real data from a binary trait (major depression) and a continuous trait (educational attainment). Our research calls for a better approach to integrating information from multiple cohorts to address issues of between-cohort heterogeneity.
在多基因评分 (PGS) 分析中,决定系数 (R) 是评估疗效的关键统计量。R 是由 PGS 解释的表型方差比例,在与提供等位基因效应大小估计值的全基因组关联研究 (GWAS) 独立的队列中计算。基于 SNP 的遗传力 (h,归因于所有常见 SNP 的总表型方差比例) 是样本外预测 R 的理论上限。然而,在实际数据分析中,已经报道 R 超过了 h,这与观察到的 h 估计值随着元分析的队列数量增加而趋于下降的情况同时发生。在这里,我们定量说明了为什么会出现这些观察结果,以及何时会出现这些结果。使用理论和模拟,我们表明,如果队列特异性 h 存在异质性,或者队列之间的遗传相关性小于一,则随着元分析的队列数量的增加,h 估计值可能会下降。我们得出了 R 超出 h 的条件,并通过来自二元性状(重度抑郁症)和连续性状(教育程度)的真实数据验证了我们的推导。我们的研究呼吁采用更好的方法来整合来自多个队列的信息,以解决队列间异质性问题。