Blanc Jennifer, Berg Jeremy J
Department of Human Genetics, University of Chicago, 920 E 58th St CLSC, Chicago, IL 60637, USA.
Genetics. 2025 Jun 4;230(2). doi: 10.1093/genetics/iyaf071.
Polygenic scores have become an important tool in human genetics, enabling the prediction of individuals' phenotypes from their genotypes. Understanding how the pattern of differences in polygenic score predictions across individuals intersects with variation in ancestry can provide insights into the evolutionary forces acting on the trait in question and is important for understanding health disparities. However, because most polygenic scores are computed using effect estimates from population samples, they are susceptible to confounding by both genetic and environmental effects that are correlated with ancestry. The extent to which this confounding drives patterns in the distribution of polygenic scores depends on the patterns of population structure in both the original estimation panel and in the prediction/test panel. Here, we use theory from population and statistical genetics, together with simulations, to study the procedure of testing for an association between polygenic scores and axes of ancestry variation in the presence of confounding. We use a general model of genetic relatedness to describe how confounding in the estimation panel biases the distribution of polygenic scores in ways that depends on the degree of overlap in population structure between panels. We then show how this confounding can bias tests for associations between polygenic scores and important axes of ancestry variation in the test panel. Specifically, for any given test, there exists a single axis of population structure in the genome-wide association study (GWAS) panel that needs to be controlled for in order to protect the test. In the context of this result, we study the behavior of multiple approaches to control for stratification along this axis, including standard methods such using principal components as fixed covariates in the GWAS, linear mixed models, and a novel approach for directly estimating the axis using the test panel genotypes. Our analyses highlight the role of estimation noise in the models of population structure as a plausible source of residual confounding in polygenic score analyses.
多基因评分已成为人类遗传学中的一项重要工具,能够根据个体的基因型预测其表型。了解多基因评分预测在个体间的差异模式如何与祖先差异相互交织,有助于深入了解影响相关性状的进化力量,对于理解健康差异也至关重要。然而,由于大多数多基因评分是使用来自人群样本的效应估计值计算得出的,它们容易受到与祖先相关的遗传和环境效应的混杂影响。这种混杂对多基因评分分布模式的影响程度取决于原始估计面板和预测/测试面板中的人群结构模式。在此,我们运用群体遗传学和统计遗传学理论,并结合模拟,研究在存在混杂因素的情况下,检验多基因评分与祖先差异轴之间关联的过程。我们使用遗传相关性的一般模型来描述估计面板中的混杂如何以取决于面板间人群结构重叠程度的方式使多基因评分的分布产生偏差。然后我们展示这种混杂如何使测试面板中多基因评分与重要祖先差异轴之间的关联检验产生偏差。具体而言,对于任何给定的检验,在全基因组关联研究(GWAS)面板中存在一个单一的人群结构轴,为了保护检验需要对其进行控制。在此结果的背景下,我们研究了多种控制沿此轴分层的方法的行为,包括标准方法,如在GWAS中使用主成分作为固定协变量、线性混合模型,以及一种使用测试面板基因型直接估计该轴的新方法。我们的分析强调了群体结构模型中估计噪声作为多基因评分分析中残余混杂的一个合理来源的作用。
Cochrane Database Syst Rev. 2022-5-20
Autism Adulthood. 2024-12-2
Cochrane Database Syst Rev. 2022-10-4
Health Technol Assess. 2006-9
Cochrane Database Syst Rev. 2025-3-25
Cochrane Database Syst Rev. 2018-1-16
bioRxiv. 2025-2-4
Bioinform Adv. 2025-3-3
Philos Trans R Soc Lond B Biol Sci. 2022-6-6
Nat Hum Behav. 2021-12
PLoS Biol. 2021-1