Chen Chia-Yen, Han Jiali, Hunter David J, Kraft Peter, Price Alkes L
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.
Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America.
Genet Epidemiol. 2015 Sep;39(6):427-38. doi: 10.1002/gepi.21906. Epub 2015 May 21.
Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R(2) for HC increased by 66% (0.0456-0.0755; P < 10(-16)), the R(2) for TA increased by 123% (0.0154 to 0.0344; P < 10(-16)), and the liability-scale R(2) for BCC increased by 68% (0.0138-0.0232; P < 10(-16)) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.
使用全基因组单核苷酸多态性(SNPs)进行多基因预测可为复杂性状提供较高的预测准确性。在此,我们研究了在进行多基因预测时如何考虑遗传血统的问题。我们发现,在结构化群体中多基因预测的准确性可能部分归因于遗传血统。然而,我们推测显式地对血统进行建模可以提高多基因预测准确性。我们分析了三项针对欧裔美国人(样本量从7440至9822)头发颜色(HC)、晒黑能力(TA)和基底细胞癌(BCC)的全基因组关联研究(GWAS),并考虑了两种广泛使用的多基因预测方法:多基因风险评分(PRSs)和最佳线性无偏预测(BLUP)。我们将未对血统进行校正的多基因预测与将血统作为模型中一个单独成分的多基因预测进行了比较。在使用PRS方法的10倍交叉验证中,当显式地对血统进行建模时,HC的R²增加了66%(从0.0456增至0.0755;P < 10⁻¹⁶),TA的R²增加了123%(从0.0154增至0.0344;P < 10⁻¹⁶),BCC的责任量表R²增加了68%(从0.0138增至0.0232;P < 10⁻¹⁶),这可防止血统效应进入每个SNP效应并被过度加权。令人惊讶的是,当使用BLUP方法时,显式地对血统进行建模也产生了类似的改进,该方法在单个方差成分中同时拟合所有SNP,导致血统被加权不足。我们通过模拟验证了我们的发现,模拟结果表明,随着样本量的增加,预测准确性的差异幅度将增大。总之,我们的结果表明,显式地对血统进行建模在PRS和BLUP预测中都可能很重要。