Pärna Katri, Nolte Ilja M, Snieder Harold, Fischer Krista, Marnetto Davide, Pagani Luca
Institute of Genomics, University of Tartu, Tartu, Estonia.
Department of Epidemiology, University of Groningen, Groningen, Netherlands.
Front Genet. 2022 Jul 18;13:899523. doi: 10.3389/fgene.2022.899523. eCollection 2022.
One important confounder in genome-wide association studies (GWASs) is population genetic structure, which may generate spurious associations if not properly accounted for. This may ultimately result in a biased polygenic risk score (PRS) prediction, especially when applied to another population. To explore this matter, we focused on principal component analysis (PCA) and asked whether a population genetics informed strategy focused on PCs derived from an external reference population helps in mitigating this PRS transferability issue. Throughout the study, we used two complex model traits, height and body mass index, and samples from UK and Estonian Biobanks. We aimed to investigate 1) whether using a reference population (1000G) for computation of the PCs adjusted for in the discovery cohort improves the resulting PRS performance in a target set from another population and 2) whether adjusting the validation model for PCs is required at all. Our results showed that any other set of PCs performed worse than the one computed on samples from the same population as the discovery dataset. Furthermore, we show that PC correction in GWAS cannot prevent residual population structure information in the PRS, also for non-structured traits. Therefore, we confirm the utility of PC correction in the validation model when the investigated trait shows an actual correlation with population genetic structure, to account for the residual confounding effect when evaluating the predictive value of PRS.
全基因组关联研究(GWAS)中的一个重要混杂因素是群体遗传结构,如果不加以适当考虑,可能会产生虚假关联。这最终可能导致多基因风险评分(PRS)预测出现偏差,尤其是当应用于另一群体时。为了探讨这个问题,我们重点研究了主成分分析(PCA),并询问基于来自外部参考群体的主成分的群体遗传学策略是否有助于缓解PRS可转移性问题。在整个研究中,我们使用了身高和体重指数这两个复杂的模型性状,以及来自英国生物银行和爱沙尼亚生物银行的样本。我们旨在研究:1)在发现队列中使用参考群体(1000基因组计划)计算调整后的主成分是否能提高在来自另一群体的目标集中所得的PRS性能;2)是否根本需要针对主成分调整验证模型。我们的结果表明,任何其他主成分集的表现都比在与发现数据集相同群体的样本上计算的主成分集更差。此外,我们表明,GWAS中的主成分校正无法防止PRS中残留的群体结构信息,对于非结构化性状也是如此。因此,我们证实,当所研究的性状与群体遗传结构存在实际关联时,在验证模型中进行主成分校正有助于在评估PRS预测价值时考虑残留的混杂效应。