在参考标准化框架内评估多基因预测方法。

Evaluation of polygenic prediction methodology within a reference-standardized framework.

机构信息

Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.

NIHR Maudsley Biomedical Research Centre, South London and Maudsley NHS Trust, London, United Kingdom.

出版信息

PLoS Genet. 2021 May 4;17(5):e1009021. doi: 10.1371/journal.pgen.1009021. eCollection 2021 May.

DOI:10.1371/journal.pgen.1009021

PMID:33945532

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8121285/

Abstract

The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.

摘要

多基因评分的预测效用正在提高，并且有许多多基因评分方法可用，但尚不清楚哪种方法的性能最佳。本研究在参考标准化框架内评估多基因评分方法的预测效用，该框架使用一组常见的变体和基于参考的连锁不平衡和等位基因频率估计值来构建评分。测试了八种多基因评分方法：p 值阈值和聚类（pT+clump）、SBLUP、lassosum、LDpred1、LDpred2、PRScs、DBSLMM 和 SBayesR，评估它们在英国生物库和双胞胎早期发展研究（TEDS）中预测结果的性能。比较了识别最佳 p 值阈值和收缩参数的策略，包括 10 折交叉验证、伪验证和微小模型（无验证样本）以及多基因评分弹性网络模型。使用 10 折交叉验证，LDpred2、lassosum 和 PRScs 能够很好地识别最具预测性的 p 值阈值或收缩参数，与 pT+clump 相比，观察到的和预测的结果值之间的相关性提高了 16-18%。使用伪验证，最佳方法是 PRScs、DBSLMM 和 SBayesR。PRScs 伪验证比 10 折交叉验证确定的最佳多基因评分仅差 3%。包含基于一系列参数的多基因评分的弹性网络模型始终比任何单一的多基因评分的预测效果更好。在参考标准化框架内，使用 LDpred2、lassosum 和 PRScs 实现了最佳的多基因预测，这些方法使用多种参数对多个多基因评分进行建模。本研究将帮助进行多基因评分研究的研究人员选择最强大和最具预测性的分析方法。

相似文献

Evaluation of polygenic prediction methodology within a reference-standardized framework.在参考标准化框架内评估多基因预测方法。

PLoS Genet. 2021 May 4;17(5):e1009021. doi: 10.1371/journal.pgen.1009021. eCollection 2021 May.

Polygenic scores via penalized regression on summary statistics.基于汇总统计量的惩罚回归多基因评分。

Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.

Making the Most of Clumping and Thresholding for Polygenic Scores.充分利用聚类和阈值处理多基因评分。

Am J Hum Genet. 2019 Dec 5;105(6):1213-1221. doi: 10.1016/j.ajhg.2019.11.001. Epub 2019 Nov 21.

Integrating genome-wide polygenic risk scores and non-genetic risk to predict colorectal cancer diagnosis using UK Biobank data: population based cohort study.利用英国生物库数据整合全基因组多基因风险评分和非遗传风险来预测结直肠癌诊断：基于人群的队列研究。

BMJ. 2022 Nov 9;379:e071707. doi: 10.1136/bmj-2022-071707.

Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets.在大型生物库数据集上准确且可扩展的多基因分数构建。

Am J Hum Genet. 2020 May 7;106(5):679-693. doi: 10.1016/j.ajhg.2020.03.013. Epub 2020 Apr 23.

Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning.五项生物库中多基因评分方法的评估显示，生物库之间的差异大于方法之间的差异，并发现了集成学习的益处。

Am J Hum Genet. 2024 Jul 11;111(7):1431-1447. doi: 10.1016/j.ajhg.2024.06.003. Epub 2024 Jun 21.

Improving on polygenic scores across complex traits using select and shrink with summary statistics (S4) and LDpred2.利用基于汇总统计量的选择和收缩（S4）和 LDpred2 提高复杂性状的多基因评分。

BMC Genomics. 2024 Sep 18;25(1):878. doi: 10.1186/s12864-024-10706-3.

Inferring disease architecture and predictive ability with LDpred2-auto.利用 LDpred2-auto 推断疾病结构和预测能力。

Am J Hum Genet. 2023 Dec 7;110(12):2042-2055. doi: 10.1016/j.ajhg.2023.10.010. Epub 2023 Nov 8.

Variable prediction accuracy of polygenic scores within an ancestry group.群体内多基因评分的预测准确性存在差异。

Elife. 2020 Jan 30;9:e48376. doi: 10.7554/eLife.48376.

A Comparison of Ten Polygenic Score Methods for Psychiatric Disorders Applied Across Multiple Cohorts.多种队列研究中精神障碍十种多基因风险评分方法的比较

Biol Psychiatry. 2021 Nov 1;90(9):611-620. doi: 10.1016/j.biopsych.2021.04.018. Epub 2021 May 4.

引用本文的文献

Integrating Imaging-Derived Clinical Endotypes with Plasma Proteomics and External Polygenic Risk Scores Enhances Coronary Microvascular Disease Risk Prediction.将影像学衍生的临床内型与血浆蛋白质组学和外部多基因风险评分相结合可增强冠状动脉微血管疾病风险预测。

medRxiv. 2025 Aug 21:2025.08.18.25333844. doi: 10.1101/2025.08.18.25333844.

LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes.LDAK-KVIK对定量和二元表型进行快速且强大的混合模型关联分析。

Nat Genet. 2025 Aug 11. doi: 10.1038/s41588-025-02286-z.

Uncovering the multivariate genetic architecture of frailty with genomic structural equation modeling.运用基因组结构方程模型揭示衰弱的多变量遗传结构。

Nat Genet. 2025 Aug 4. doi: 10.1038/s41588-025-02269-0.

PGSFusion streamlines polygenic score construction and epidemiological applications in biobank-scale cohorts.PGSFusion简化了生物样本库规模队列中的多基因评分构建和流行病学应用。

Genome Med. 2025 Jul 14;17(1):77. doi: 10.1186/s13073-025-01505-w.

Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms.使用快速且内存高效的算法进行多基因评分的全基因组推断。

Am J Hum Genet. 2025 May 20. doi: 10.1016/j.ajhg.2025.05.002.

Genome-wide association meta-analysis of age at onset of walking in over 70,000 infants of European ancestry.对70000多名欧洲血统婴儿开始行走年龄的全基因组关联荟萃分析。

Nat Hum Behav. 2025 May 7. doi: 10.1038/s41562-025-02145-1.

The accuracy of polygenic score models for BMI and Type II diabetes in the Native Hawaiian population.夏威夷原住民人群中体重指数和II型糖尿病多基因评分模型的准确性。

Commun Biol. 2025 Apr 23;8(1):651. doi: 10.1038/s42003-025-08050-7.

Enhancing polygenic scores for cardiometabolic traits through tissue- and cell-type-specific functional annotations.通过组织和细胞类型特异性功能注释提高心血管代谢性状的多基因分数。

HGG Adv. 2025 Mar 25;6(3):100427. doi: 10.1016/j.xhgg.2025.100427.

PGSXplorer: an integrated nextflow pipeline for comprehensive quality control and polygenic score model development.PGSXplorer：一个用于全面质量控制和多基因评分模型开发的集成式Nextflow工作流程。

PeerJ. 2025 Feb 12;13:e18973. doi: 10.7717/peerj.18973. eCollection 2025.

Clinical utility and implementation of polygenic risk scores for predicting cardiovascular disease: A clinical consensus statement of the ESC Council on Cardiovascular Genomics, the ESC Cardiovascular Risk Collaboration, and the European Association of Preventive Cardiology.用于预测心血管疾病的多基因风险评分的临床效用与应用：欧洲心脏病学会心血管基因组学委员会、欧洲心脏病学会心血管风险协作组及欧洲预防心脏病学协会的临床共识声明

Eur Heart J. 2025 Apr 15;46(15):1372-1383. doi: 10.1093/eurheartj/ehae649.

本文引用的文献

Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets.纳入功能先验信息可提高 UK Biobank 和 23andMe 数据集的多基因预测准确性。

Nat Commun. 2021 Oct 18;12(1):6052. doi: 10.1038/s41467-021-25171-9.

LDpred2: better, faster, stronger.LDpred2：更优、更快、更强。

Bioinformatics. 2021 Apr 1;36(22-23):5424-5431. doi: 10.1093/bioinformatics/btaa1029.

Impute.me: An Open-Source, Non-profit Tool for Using Data From Direct-to-Consumer Genetic Testing to Calculate and Interpret Polygenic Risk Scores.Impute.me：一种用于利用直接面向消费者的基因检测数据来计算和解释多基因风险评分的开源非营利工具。

Front Genet. 2020 Jun 30;11:578. doi: 10.3389/fgene.2020.00578. eCollection 2020.

Tutorial: a guide to performing polygenic risk score analyses.教程：多基因风险评分分析操作指南。

Nat Protoc. 2020 Sep;15(9):2759-2772. doi: 10.1038/s41596-020-0353-1. Epub 2020 Jul 24.

A principal component approach to improve association testing with polygenic risk scores.一种基于主成分分析的方法，用于提高基于多基因风险评分的关联分析。

Genet Epidemiol. 2020 Oct;44(7):676-686. doi: 10.1002/gepi.22339. Epub 2020 Jul 21.

Polygenic risk scores: from research tools to clinical instruments.多基因风险评分：从研究工具到临床工具。

Genome Med. 2020 May 18;12(1):44. doi: 10.1186/s13073-020-00742-5.

Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets.在大型生物库数据集上准确且可扩展的多基因分数构建。

Am J Hum Genet. 2020 May 7;106(5):679-693. doi: 10.1016/j.ajhg.2020.03.013. Epub 2020 Apr 23.

Making the Most of Clumping and Thresholding for Polygenic Scores.充分利用聚类和阈值处理多基因评分。

Am J Hum Genet. 2019 Dec 5;105(6):1213-1221. doi: 10.1016/j.ajhg.2019.11.001. Epub 2019 Nov 21.

Improved polygenic prediction by Bayesian multiple regression on summary statistics.基于汇总统计数据的贝叶斯多元回归提高多基因预测能力。

Nat Commun. 2019 Nov 8;10(1):5086. doi: 10.1038/s41467-019-12653-0.

Classical Human Leukocyte Antigen Alleles and C4 Haplotypes Are Not Significantly Associated With Depression.经典人类白细胞抗原等位基因和 C4 单倍型与抑郁症无显著相关性。

Biol Psychiatry. 2020 Mar 1;87(5):419-430. doi: 10.1016/j.biopsych.2019.06.031. Epub 2019 Aug 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在参考标准化框架内评估多基因预测方法。

Evaluation of polygenic prediction methodology within a reference-standardized framework.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献