Mefford Joel, Smullen Molly, Zhang Felix, Sadowski Michal, Border Richard, Dahl Andy, Flint Jonathan, Zaitlen Noah
Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA, USA.
Chan Medical School, University of Massachusetts, Worcester, MA, USA.
Am J Hum Genet. 2025 Jun 5;112(6):1363-1375. doi: 10.1016/j.ajhg.2025.04.013.
Polygenic scores (PGSs) are genetic predictions of trait values or disease risk that are increasingly finding applications in clinical predictive models and basic genetics research. However, the predictive value of a PGS can vary within similar population groups, depending on characteristics such as the environmental exposures, sex, age, or socioeconomic status of the individuals. To maximize the value of a PGS, approaches to screen trait-PGS pairs for evidence of such heterogeneity without having to specify the relevant exposure or individual characteristics would be useful. Here, in analyses from the UK Biobank, we show that a PGS's predictive accuracy depends on the quantile of the phenotypic distribution to which the PGS is being applied. We quantify differences in predictive value across the phenotypic range using quantile regression linear models to estimate quantile-specific effect sizes for linear models of phenotype values as a function of PGS. Of 25 continuous traits, only three have no quantile-specific effect sizes that varied by at least 1.2-fold from the ordinary least squares estimate. Through simulation, we demonstrate that this heterogeneity of PGS predictive value can arise from gene-by-environment interactions. Our approach can be used to flag traits where the use of PGSs warrants extra caution, and perhaps stratification variables should be sought and used because PGSs perform substantially differently in portions of the sampled population than expected from quoted predictive R or incremental R values that represent average performance across a dataset.
多基因分数(PGS)是对性状值或疾病风险的遗传预测,越来越多地应用于临床预测模型和基础遗传学研究。然而,PGS的预测价值在相似人群组中可能会有所不同,这取决于个体的环境暴露、性别、年龄或社会经济地位等特征。为了最大化PGS的价值,筛选性状-PGS对以寻找这种异质性证据而无需指定相关暴露或个体特征的方法将很有用。在此,在英国生物银行的分析中,我们表明PGS的预测准确性取决于应用PGS的表型分布分位数。我们使用分位数回归线性模型来估计表型值线性模型的分位数特定效应大小,以此作为PGS的函数,从而量化整个表型范围内预测价值的差异。在25个连续性状中,只有三个没有分位数特定效应大小,其与普通最小二乘估计的差异至少为1.2倍。通过模拟,我们证明PGS预测价值的这种异质性可能源于基因-环境相互作用。我们的方法可用于标记那些使用PGS需要格外谨慎的性状,也许应该寻找并使用分层变量,因为PGS在抽样人群的某些部分中的表现与代表数据集平均表现的引用预测R或增量R值所预期的有很大不同。