连锁不平衡建模提高了多基因风险评分的准确性。

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.

作者信息

Vilhjálmsson Bjarni J, Yang Jian, Finucane Hilary K, Gusev Alexander, Lindström Sara, Ripke Stephan, Genovese Giulio, Loh Po-Ru, Bhatia Gaurav, Do Ron, Hayeck Tristan, Won Hong-Hee, Kathiresan Sekar, Pato Michele, Pato Carlos, Tamimi Rulla, Stahl Eli, Zaitlen Noah, Pasaniuc Bogdan, Belbin Gillian, Kenny Eimear E, Schierup Mikkel H, De Jager Philip, Patsopoulos Nikolaos A, McCarroll Steve, Daly Mark, Purcell Shaun, Chasman Daniel, Neale Benjamin, Goddard Michael, Visscher Peter M, Kraft Peter, Patterson Nick, Price Alkes L

机构信息

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.

Queensland Brain Institute, University of Queensland, Brisbane, 4072 QLD, Australia; Diamantina Institute, Translational Research Institute, University of Queensland, Brisbane, 4101 QLD, Australia.

出版信息

Am J Hum Genet. 2015 Oct 1;97(4):576-92. doi: 10.1016/j.ajhg.2015.09.001.

DOI:10.1016/j.ajhg.2015.09.001

PMID:26430803

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4596916/

Abstract

Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

摘要

多基因风险评分在预测复杂疾病风险方面已显示出巨大潜力，并且随着训练样本量的增加会变得更加准确。计算风险评分的标准方法涉及基于连锁不平衡（LD）的标记物筛选以及对关联统计量应用p值阈值，但这会丢弃信息并可能降低预测准确性。我们引入了LDpred方法，该方法通过使用效应大小的先验信息和来自外部参考面板的LD信息来推断每个标记物的后验平均效应大小。理论和模拟表明，LDpred优于先进行筛选然后设置阈值的方法，尤其是在大样本量时。因此，在一个大型精神分裂症数据集中，预测的R(2)从20.1%提高到了25.3%，在一个大型多发性硬化症数据集中从9.8%提高到了12.0%。在另外三个大型疾病数据集以及非欧洲精神分裂症样本中也观察到了类似的相对准确性提高。随着样本量的增加，LDpred相对于现有方法的优势将更加明显。

相似文献

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.连锁不平衡建模提高了多基因风险评分的准确性。

Am J Hum Genet. 2015 Oct 1;97(4):576-92. doi: 10.1016/j.ajhg.2015.09.001.

Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics.基于分区 GWAS 汇总统计量的非参数多基因风险预测。

Am J Hum Genet. 2020 Jul 2;107(1):46-59. doi: 10.1016/j.ajhg.2020.05.004. Epub 2020 May 28.

POLARIS: Polygenic LD-adjusted risk score approach for set-based analysis of GWAS data.POLARIS：用于全基因组关联研究（GWAS）数据基于集合分析的多基因连锁不平衡调整风险评分方法。

Genet Epidemiol. 2018 Jun;42(4):366-377. doi: 10.1002/gepi.22117. Epub 2018 Mar 12.

Polygenic prediction via Bayesian regression and continuous shrinkage priors.基于贝叶斯回归和连续收缩先验的多基因预测。

Nat Commun. 2019 Apr 16;10(1):1776. doi: 10.1038/s41467-019-09718-5.

Polygenic scores via penalized regression on summary statistics.基于汇总统计量的惩罚回归多基因评分。

Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.

A flexible and parallelizable approach to genome-wide polygenic risk scores.一种灵活且可并行化的全基因组多基因风险评分方法。

Genet Epidemiol. 2019 Oct;43(7):730-741. doi: 10.1002/gepi.22245. Epub 2019 Jul 22.

Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction.对基因相关疾病和功能注释进行联合建模可提高多基因风险预测的准确性。

PLoS Genet. 2017 Jun 9;13(6):e1006836. doi: 10.1371/journal.pgen.1006836. eCollection 2017 Jun.

Power and predictive accuracy of polygenic risk scores.多基因风险评分的效力和预测准确性。

PLoS Genet. 2013 Mar;9(3):e1003348. doi: 10.1371/journal.pgen.1003348. Epub 2013 Mar 21.

Genome-wide association study of clinical dimensions of schizophrenia: polygenic effect on disorganized symptoms.全基因组关联研究精神分裂症的临床维度：精神分裂症瓦解症状的多基因效应。

Am J Psychiatry. 2012 Dec;169(12):1309-17. doi: 10.1176/appi.ajp.2012.12020218.

A robust method to estimate regional polygenic correlation under misspecified linkage disequilibrium structure.一种在错误指定连锁不平衡结构下估计区域多基因相关性的稳健方法。

Genet Epidemiol. 2018 Oct;42(7):636-647. doi: 10.1002/gepi.22149. Epub 2018 Aug 29.

引用本文的文献

Statistical learning methods for improving predictive performance in time-dependent survival models.用于提高时间相依生存模型预测性能的统计学习方法。

Genomics Inform. 2025 Sep 1;23(1):19. doi: 10.1186/s44342-025-00050-7.

Recent Advances in Experimental Functional Characterization of GWAS Candidate Genes in Osteoporosis.骨质疏松症全基因组关联研究候选基因实验功能表征的最新进展

Int J Mol Sci. 2025 Jul 26;26(15):7237. doi: 10.3390/ijms26157237.

Polygenic Risk Scores for Pediatric Obsessive-Compulsive Symptoms and their Mediating Effect in Clinically Diagnosed Samples of Obsessive-Compulsive Disorder, Attention-Deficit/Hyperactivity Disorder, Anxiety, Depression, Autism and Tourette syndrome.儿童强迫症症状的多基因风险评分及其在强迫症、注意力缺陷/多动障碍、焦虑症、抑郁症、自闭症和抽动秽语综合征临床诊断样本中的中介作用。

Res Sq. 2025 Aug 6:rs.3.rs-7115885. doi: 10.21203/rs.3.rs-7115885/v1.

Differential performance of polygenic risk scores for heart disease in Hispanic/Latino subgroups: Findings of the Hispanic Community Health Study/Study of Latinos.西班牙裔/拉丁裔亚组中心脏病多基因风险评分的差异表现：西班牙裔社区健康研究/拉丁裔研究的结果

HGG Adv. 2025 Jul 28;6(4):100486. doi: 10.1016/j.xhgg.2025.100486.

Genomic and Precision Medicine Approaches in Atherosclerotic Cardiovascular Disease: From Risk Prediction to Therapy-A Review.动脉粥样硬化性心血管疾病的基因组学和精准医学方法：从风险预测到治疗——综述

Biomedicines. 2025 Jul 14;13(7):1723. doi: 10.3390/biomedicines13071723.

Lessons in adjusting for genetic confounding in population research on education and health.教育与健康人群研究中基因混杂因素调整的经验教训。

SSM Popul Health. 2025 Jun 26;31:101834. doi: 10.1016/j.ssmph.2025.101834. eCollection 2025 Sep.

Robust angle-based transfer learning in high dimensions.高维空间中基于稳健角度的迁移学习

J R Stat Soc Series B Stat Methodol. 2024 Dec 3;87(3):723-745. doi: 10.1093/jrsssb/qkae111. eCollection 2025 Jul.

PGSFusion streamlines polygenic score construction and epidemiological applications in biobank-scale cohorts.PGSFusion简化了生物样本库规模队列中的多基因评分构建和流行病学应用。

Genome Med. 2025 Jul 14;17(1):77. doi: 10.1186/s13073-025-01505-w.

Admixed and single-continental genome segments of the same ancestry have distinct linkage disequilibrium patterns.具有相同祖先的混合和单一大陆基因组片段具有不同的连锁不平衡模式。

Genome Biol. 2025 Jul 11;26(1):201. doi: 10.1186/s13059-025-03672-w.

Data simulation to optimize frameworks for genome-wide association studies in diverse populations.数据模拟以优化不同人群全基因组关联研究的框架。

Front Genet. 2025 Jun 18;16:1559496. doi: 10.3389/fgene.2025.1559496. eCollection 2025.

本文引用的文献

Partitioning heritability by functional annotation using genome-wide association summary statistics.利用全基因组关联研究汇总统计数据，通过功能注释对遗传力进行划分。

Nat Genet. 2015 Nov;47(11):1228-35. doi: 10.1038/ng.3404. Epub 2015 Sep 28.

Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction.明确的血统建模可改善多基因风险评分和最佳线性无偏预测。

Genet Epidemiol. 2015 Sep;39(6):427-38. doi: 10.1002/gepi.21906. Epub 2015 May 21.

Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model.使用贝叶斯混合模型对复杂性状进行同时发现、估计和预测分析。

PLoS Genet. 2015 Apr 7;11(4):e1004969. doi: 10.1371/journal.pgen.1004969. eCollection 2015 Apr.

Efficient Bayesian mixed-model analysis increases association power in large cohorts.高效的贝叶斯混合模型分析提高了大型队列研究中的关联效能。

Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.LD评分回归在全基因组关联研究中区分混杂因素与多基因性。

Nat Genet. 2015 Mar;47(3):291-5. doi: 10.1038/ng.3211. Epub 2015 Feb 2.

Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder.对精神疾病进行联合分析可提高精神分裂症、双相情感障碍和重度抑郁症风险预测的准确性。

Am J Hum Genet. 2015 Feb 5;96(2):283-94. doi: 10.1016/j.ajhg.2014.12.006. Epub 2015 Jan 29.

Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases.11种常见疾病中调控性和细胞类型特异性变异的遗传性划分

Am J Hum Genet. 2014 Nov 6;95(5):535-52. doi: 10.1016/j.ajhg.2014.10.004.

Defining the role of common variation in the genomic and biological architecture of adult human height.确定常见变异在成年人类身高的基因组和生物学结构中的作用。

Nat Genet. 2014 Nov;46(11):1173-86. doi: 10.1038/ng.3097. Epub 2014 Oct 5.

Effective genetic-risk prediction using mixed models.使用混合模型进行有效的遗传风险预测。

Am J Hum Genet. 2014 Oct 2;95(4):383-93. doi: 10.1016/j.ajhg.2014.09.007.

Biological insights from 108 schizophrenia-associated genetic loci.108 个精神分裂症相关遗传位点的生物学见解。

Nature. 2014 Jul 24;511(7510):421-7. doi: 10.1038/nature13595. Epub 2014 Jul 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验