明确的血统建模可改善多基因风险评分和最佳线性无偏预测。

Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction.

作者信息

Chen Chia-Yen, Han Jiali, Hunter David J, Kraft Peter, Price Alkes L

机构信息

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.

Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America.

出版信息

Genet Epidemiol. 2015 Sep;39(6):427-38. doi: 10.1002/gepi.21906. Epub 2015 May 21.

DOI:10.1002/gepi.21906

PMID:25995153

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4734143/

Abstract

Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R(2) for HC increased by 66% (0.0456-0.0755; P < 10(-16)), the R(2) for TA increased by 123% (0.0154 to 0.0344; P < 10(-16)), and the liability-scale R(2) for BCC increased by 68% (0.0138-0.0232; P < 10(-16)) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

摘要

使用全基因组单核苷酸多态性（SNPs）进行多基因预测可为复杂性状提供较高的预测准确性。在此，我们研究了在进行多基因预测时如何考虑遗传血统的问题。我们发现，在结构化群体中多基因预测的准确性可能部分归因于遗传血统。然而，我们推测显式地对血统进行建模可以提高多基因预测准确性。我们分析了三项针对欧裔美国人（样本量从7440至9822）头发颜色（HC）、晒黑能力（TA）和基底细胞癌（BCC）的全基因组关联研究（GWAS），并考虑了两种广泛使用的多基因预测方法：多基因风险评分（PRSs）和最佳线性无偏预测（BLUP）。我们将未对血统进行校正的多基因预测与将血统作为模型中一个单独成分的多基因预测进行了比较。在使用PRS方法的10倍交叉验证中，当显式地对血统进行建模时，HC的R²增加了66%（从0.0456增至0.0755；P < 10⁻¹⁶），TA的R²增加了123%（从0.0154增至0.0344；P < 10⁻¹⁶），BCC的责任量表R²增加了68%（从0.0138增至0.0232；P < 10⁻¹⁶），这可防止血统效应进入每个SNP效应并被过度加权。令人惊讶的是，当使用BLUP方法时，显式地对血统进行建模也产生了类似的改进，该方法在单个方差成分中同时拟合所有SNP，导致血统被加权不足。我们通过模拟验证了我们的发现，模拟结果表明，随着样本量的增加，预测准确性的差异幅度将增大。总之，我们的结果表明，显式地对血统进行建模在PRS和BLUP预测中都可能很重要。

相似文献

Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction.明确的血统建模可改善多基因风险评分和最佳线性无偏预测。

Genet Epidemiol. 2015 Sep;39(6):427-38. doi: 10.1002/gepi.21906. Epub 2015 May 21.

Genome-wide association studies and polygenic risk scores for skin cancer: clinically useful yet?皮肤癌的全基因组关联研究和多基因风险评分：目前在临床上有用吗？

Br J Dermatol. 2019 Dec;181(6):1146-1155. doi: 10.1111/bjd.17917. Epub 2019 Jul 7.

POLARIS: Polygenic LD-adjusted risk score approach for set-based analysis of GWAS data.POLARIS：用于全基因组关联研究（GWAS）数据基于集合分析的多基因连锁不平衡调整风险评分方法。

Genet Epidemiol. 2018 Jun;42(4):366-377. doi: 10.1002/gepi.22117. Epub 2018 Mar 12.

A principal component approach to improve association testing with polygenic risk scores.一种基于主成分分析的方法，用于提高基于多基因风险评分的关联分析。

Genet Epidemiol. 2020 Oct;44(7):676-686. doi: 10.1002/gepi.22339. Epub 2020 Jul 21.

A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits.利用多基因性状的遗传相关性进行跨人群性状预测的统一框架。

Am J Hum Genet. 2021 Apr 1;108(4):632-655. doi: 10.1016/j.ajhg.2021.03.002. Epub 2021 Mar 25.

Variable prediction accuracy of polygenic scores within an ancestry group.群体内多基因评分的预测准确性存在差异。

Elife. 2020 Jan 30;9:e48376. doi: 10.7554/eLife.48376.

Genome-wide association study and prediction of genomic breeding values for fatty-acid composition in Korean Hanwoo cattle using a high-density single-nucleotide polymorphism array.全基因组关联研究和利用高密度单核苷酸多态性芯片预测韩牛脂肪酸组成的基因组育种值。

J Anim Sci. 2018 Sep 29;96(10):4063-4075. doi: 10.1093/jas/sky280.

Development and validation of genome-wide polygenic risk scores for predicting breast cancer incidence in Japanese females: a population-based case-cohort study.基于人群的病例-对照研究：开发和验证用于预测日本女性乳腺癌发病风险的全基因组多基因风险评分。

Breast Cancer Res Treat. 2023 Feb;197(3):661-671. doi: 10.1007/s10549-022-06843-6. Epub 2022 Dec 20.

MUSSEL: Enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups.基于多祖先群体信息的贝类增强贝叶斯多基因风险预测

Cell Genom. 2024 Apr 10;4(4):100539. doi: 10.1016/j.xgen.2024.100539.

Genome-Wide Association Study of Suicide Death and Polygenic Prediction of Clinical Antecedents.全基因组关联研究自杀死亡和临床前驱因素的多基因预测。

Am J Psychiatry. 2020 Oct 1;177(10):917-927. doi: 10.1176/appi.ajp.2020.19101025.

引用本文的文献

Population structure limits the use of genomic data for predicting phenotypes and managing genetic resources in forest trees.群体结构限制了基因组数据在预测林木表型和管理遗传资源方面的应用。

Proc Natl Acad Sci U S A. 2025 Jul;122(26):e2425691122. doi: 10.1073/pnas.2425691122. Epub 2025 Jun 25.

PGSXplorer: an integrated nextflow pipeline for comprehensive quality control and polygenic score model development.PGSXplorer：一个用于全面质量控制和多基因评分模型开发的集成式Nextflow工作流程。

PeerJ. 2025 Feb 12;13:e18973. doi: 10.7717/peerj.18973. eCollection 2025.

The GenoPred pipeline: a comprehensive and scalable pipeline for polygenic scoring.GenoPred 管道：一种全面且可扩展的多基因评分管道。

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae551.

Ancestry-aligned polygenic scores combined with conventional risk factors improve prediction of cardiometabolic outcomes in African populations.基于祖源的多基因风险评分结合传统危险因素可改善非洲人群心血管代谢结局的预测。

Genome Med. 2024 Aug 26;16(1):106. doi: 10.1186/s13073-024-01377-6.

Machine learning models for predicting blood pressure phenotypes by combining multiple polygenic risk scores.基于多种多基因风险评分预测血压表型的机器学习模型。

Sci Rep. 2024 May 30;14(1):12436. doi: 10.1038/s41598-024-62945-9.

Polygenic Risk Score in African populations: progress and challenges.非洲人群中的多基因风险评分：进展与挑战。

F1000Res. 2023 Apr 11;11:175. doi: 10.12688/f1000research.76218.2. eCollection 2022.

Genome-wide polygenic risk score for major osteoporotic fractures in postmenopausal women using associated single nucleotide polymorphisms.利用相关单核苷酸多态性对绝经后妇女的主要骨质疏松性骨折进行全基因组多基因风险评分。

J Transl Med. 2023 Feb 16;21(1):127. doi: 10.1186/s12967-023-03974-2.

Polygenic risk score improves the accuracy of a clinical risk score for coronary artery disease.多基因风险评分可提高冠心病临床风险评分的准确性。

BMC Med. 2022 Nov 7;20(1):385. doi: 10.1186/s12916-022-02583-y.

Polygenic transcriptome risk scores for COPD and lung function improve cross-ethnic portability of prediction in the NHLBI TOPMed program.多基因转录组风险评分可改善 COPD 和肺功能在 NHLBI TOPMed 计划中的跨种族预测可转移性。

Am J Hum Genet. 2022 May 5;109(5):857-870. doi: 10.1016/j.ajhg.2022.03.007. Epub 2022 Apr 5.

Canalization of the Polygenic Risk for Common Diseases and Traits in the UK Biobank Cohort.多基因疾病和特征的英国生物库队列的 canalization。

Mol Biol Evol. 2022 Apr 11;39(4). doi: 10.1093/molbev/msac053.

本文引用的文献

Defining the role of common variation in the genomic and biological architecture of adult human height.确定常见变异在成年人类身高的基因组和生物学结构中的作用。

Nat Genet. 2014 Nov;46(11):1173-86. doi: 10.1038/ng.3097. Epub 2014 Oct 5.

Biological insights from 108 schizophrenia-associated genetic loci.108 个精神分裂症相关遗传位点的生物学见解。

Nature. 2014 Jul 24;511(7510):421-7. doi: 10.1038/nature13595. Epub 2014 Jul 22.

Advantages and pitfalls in the application of mixed-model association methods.混合模型关联方法应用的优缺点。

Nat Genet. 2014 Feb;46(2):100-6. doi: 10.1038/ng.2876.

Prediction of complex human traits using the genomic best linear unbiased predictor.利用基因组最佳线性无偏预测器预测复杂人类特征。

PLoS Genet. 2013;9(7):e1003608. doi: 10.1371/journal.pgen.1003608. Epub 2013 Jul 11.

Pitfalls of predicting complex traits from SNPs.从单核苷酸多态性预测复杂性状的陷阱。

Nat Rev Genet. 2013 Jul;14(7):507-15. doi: 10.1038/nrg3457.

Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease.大样本量、广泛的变异谱和先进的机器学习技术提高了炎症性肠病的风险预测能力。

Am J Hum Genet. 2013 Jun 6;92(6):1008-12. doi: 10.1016/j.ajhg.2013.05.002. Epub 2013 May 23.

GWAS of 126,559 individuals identifies genetic variants associated with educational attainment.对 126559 人的全基因组关联研究发现了与受教育程度相关的遗传变异。

Science. 2013 Jun 21;340(6139):1467-71. doi: 10.1126/science.1235488. Epub 2013 May 30.

Accuracy of prediction of genomic breeding values for residual feed intake and carcass and meat quality traits in Bos taurus, Bos indicus, and composite beef cattle.预测肉牛、瘤牛和杂交肉牛的剩余采食量和胴体及肉质性状的基因组育种值的准确性。

J Anim Sci. 2013 Jul;91(7):3088-104. doi: 10.2527/jas.2012-5827. Epub 2013 May 8.

Genomic BLUP decoded: a look into the black box of genomic prediction.基因组 BLUP 解码：探索基因组预测的黑箱。

Genetics. 2013 Jul;194(3):597-607. doi: 10.1534/genetics.113.152207. Epub 2013 May 2.

Power and predictive accuracy of polygenic risk scores.多基因风险评分的效力和预测准确性。

PLoS Genet. 2013 Mar;9(3):e1003348. doi: 10.1371/journal.pgen.1003348. Epub 2013 Mar 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验