National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark.
Department of Computational Biology, Institut Pasteur, Paris 75015, France; Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
Am J Hum Genet. 2022 Jan 6;109(1):12-23. doi: 10.1016/j.ajhg.2021.11.008.
The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.
多基因评分(PGS)在全球人群中的低可转移性是一个亟待解决的问题,否则 PGS 无法在临床上用于每个人。事实上,预测准确性已经被证明会随着训练和测试队列之间的遗传距离的增加而衰减。然而,这些队列不仅在遗传距离上有所不同,而且在地理距离以及数据收集和分析方面也有所不同,从而混淆了多个因素。在这项研究中,我们通过从英国生物库数据中推导 245 个经过精心挑选的特征的多基因评分,并将其应用于来自同一队列的 9 个祖先群体中,来检验 PGS 在多大程度上可以在不同的祖源之间转移。通过将训练和测试都限制在英国生物库数据中,我们降低了使用不同队列时环境和基因分型混杂的风险。我们基于一个简单、鲁棒和有效的方法,在次大陆水平上定义了这 9 个祖先群体。然后,我们应用两种不同的预测方法来推导所有 245 种表型的多基因评分,并显示出使用西北欧个体训练的 PGS 的可转移性系统性和显著降低,并应用于 9 个祖先群体。这些分析表明,预测已经在欧洲血统内下降,并按遗传距离的比例在全球范围内降低。总之,我们的研究为 PGS 可转移性问题提供了独特而可靠的见解。