Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.
Elife. 2020 Nov 17;9:e61548. doi: 10.7554/eLife.61548.
Population stratification continues to bias the results of genome-wide association studies (GWAS). When these results are used to construct polygenic scores, even subtle biases can cumulatively lead to large errors. To study the effect of residual stratification, we simulated GWAS under realistic models of demographic history. We show that when population structure is recent, it cannot be corrected using principal components of common variants because they are uninformative about recent history. Consequently, polygenic scores are biased in that they recapitulate environmental structure. Principal components calculated from rare variants or identity-by-descent segments can correct this stratification for some types of environmental effects. While family-based studies are immune to stratification, the hybrid approach of ascertaining variants in GWAS but reestimating effect sizes in siblings reduces but does not eliminate stratification. We show that the effect of population stratification depends not only on allele frequencies and environmental structure but also on demographic history.
人群分层仍然会影响全基因组关联研究(GWAS)的结果。当这些结果被用于构建多基因评分时,即使是细微的偏差也可能会累积导致较大的误差。为了研究残余分层的影响,我们在现实的人口历史模型下模拟了 GWAS。我们表明,当群体结构较新时,由于常见变异的主成分对近期历史没有信息,因此无法使用它们来纠正。因此,多基因评分存在偏差,因为它们再现了环境结构。从罕见变异或同源区段计算的主成分可以纠正某些类型的环境效应的这种分层。虽然基于家庭的研究不受分层影响,但在 GWAS 中确定变异并在兄弟姐妹中重新估计效应大小的混合方法可以减少但不能消除分层。我们表明,群体分层的影响不仅取决于等位基因频率和环境结构,还取决于人口历史。