Suppr超能文献

在参考标准化框架内评估多基因预测方法。

Evaluation of polygenic prediction methodology within a reference-standardized framework.

机构信息

Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.

NIHR Maudsley Biomedical Research Centre, South London and Maudsley NHS Trust, London, United Kingdom.

出版信息

PLoS Genet. 2021 May 4;17(5):e1009021. doi: 10.1371/journal.pgen.1009021. eCollection 2021 May.

Abstract

The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16-18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.

摘要

多基因评分的预测效用正在提高,并且有许多多基因评分方法可用,但尚不清楚哪种方法的性能最佳。本研究在参考标准化框架内评估多基因评分方法的预测效用,该框架使用一组常见的变体和基于参考的连锁不平衡和等位基因频率估计值来构建评分。测试了八种多基因评分方法:p 值阈值和聚类(pT+clump)、SBLUP、lassosum、LDpred1、LDpred2、PRScs、DBSLMM 和 SBayesR,评估它们在英国生物库和双胞胎早期发展研究(TEDS)中预测结果的性能。比较了识别最佳 p 值阈值和收缩参数的策略,包括 10 折交叉验证、伪验证和微小模型(无验证样本)以及多基因评分弹性网络模型。使用 10 折交叉验证,LDpred2、lassosum 和 PRScs 能够很好地识别最具预测性的 p 值阈值或收缩参数,与 pT+clump 相比,观察到的和预测的结果值之间的相关性提高了 16-18%。使用伪验证,最佳方法是 PRScs、DBSLMM 和 SBayesR。PRScs 伪验证比 10 折交叉验证确定的最佳多基因评分仅差 3%。包含基于一系列参数的多基因评分的弹性网络模型始终比任何单一的多基因评分的预测效果更好。在参考标准化框架内,使用 LDpred2、lassosum 和 PRScs 实现了最佳的多基因预测,这些方法使用多种参数对多个多基因评分进行建模。本研究将帮助进行多基因评分研究的研究人员选择最强大和最具预测性的分析方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验