The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; National Centre for Register-Based Research, Aarhus University, 8210 Aarhus V, Denmark.
The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210 Aarhus V, Denmark; Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000 Aarhus C, Denmark; Center for Genomics and Personalized Medicine, CGPM, Aarhus University, 8000 Aarhus C, Denmark; Bioinformatics Research Centre, Aarhus University, 8000 Aarhus C, Denmark.
Am J Hum Genet. 2021 Jun 3;108(6):1001-1011. doi: 10.1016/j.ajhg.2021.04.014. Epub 2021 May 7.
The accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWASs). However, it is now common for researchers to have access to large individual-level data as well, such as the UK Biobank data. To the best of our knowledge, it has not yet been explored how best to combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using 12 real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and meta-PRS. We find that, when large individual-level data are available, the linear combination of PRSs (meta-PRS) is both a simple alternative to meta-GWAS and often more accurate.
多基因风险评分 (PRSs) 预测复杂疾病的准确性随着训练样本量的增加而提高。PRSs 通常是基于来自多个全基因组关联研究 (GWASs) 的大型荟萃分析的汇总统计数据得出的。然而,现在研究人员通常也可以访问大型个体水平数据,例如英国生物银行 (UK Biobank) 数据。据我们所知,尚未探索如何最好地结合这两种类型的数据(汇总统计数据和个体水平数据)以优化多基因预测。最广泛使用的组合数据的方法是 GWAS 汇总统计数据的荟萃分析(meta-GWAS),但我们表明它并不总是提供最准确的 PRS。通过模拟和使用来自 iPSYCH 和 UK Biobank 的 12 个真实病例对照和定量性状以及外部 GWAS 汇总统计数据,我们比较了 meta-GWAS 与两种替代的数据组合方法,堆积聚类和阈值 (SCT) 和 meta-PRS。我们发现,当有大量个体水平数据可用时,PRS 的线性组合(meta-PRS)不仅是 meta-GWAS 的简单替代方法,而且通常更准确。