Department of Population Health Sciences, University of Leicester, Leicester, United Kingdom.
Division of Genetics and Epidemiology, The Institute of Cancer Research, London, United Kingdom.
PLoS Genet. 2024 Apr 17;20(4):e1011212. doi: 10.1371/journal.pgen.1011212. eCollection 2024 Apr.
Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (FST) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t-test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.
人群之间疾病风险的差异很常见,但这些差异的潜在遗传基础还不是很清楚。一种标准的方法是通过测试多基因评分的均值差异来比较不同人群的遗传风险,但现有的使用这种方法的研究没有考虑到由于 GWAS 训练数据的有限样本量而导致的效应估计(即 GWAS 贝塔值)中的统计噪声。在这里,我们使用贝叶斯多基因评分方法表明,不同人群之间遗传风险差异估计的不确定性水平高度依赖于 GWAS 训练样本量、多效性(因果变异数量)以及所考虑人群之间的遗传距离(FST)。我们推导出了一个 Wald 检验,用于正式评估不同人群之间遗传风险的差异,我们证明在一个简化的假设下,该检验具有校准的第一类错误率,即所有 SNP 都是独立的,我们在实践中通过连锁不平衡(LD)修剪来实现这一点。我们进一步提供了在遗传结构无穷小的特殊情况下,评估不同人群之间相对遗传风险估计不确定性的闭式表达式。我们认为,对于许多复杂的特征和疾病,特别是那些具有更多多效遗传结构的特征和疾病,目前的 GWAS 样本量不足以检测不同人群之间遗传风险的中度差异,尽管可以检测到相对遗传风险的更大差异(相对风险 > 1.5)。我们表明,没有考虑到训练样本中的抽样误差的常规方法,如使用简单的 t 检验,具有非常高的第一类错误率。当我们将我们的方法应用于前列腺癌时,我们证明了非洲裔男性的遗传风险更高,而欧洲裔和东亚裔男性的风险较低。