Suppr超能文献

基于全基因组关联研究汇总统计量的贝叶斯大规模多元回归

BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES.

作者信息

Zhu Xiang, Stephens Matthew

机构信息

University of Chicago.

出版信息

Ann Appl Stat. 2017;11(3):1561-1592. doi: 10.1214/17-aoas1046. Epub 2017 Oct 5.

Abstract

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a "Regression with Summary Statistics" (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.

摘要

用于大规模多元回归的贝叶斯方法为全基因组关联研究(GWAS)分析提供了有吸引力的途径。例如,它们可以估计复杂性状的遗传力,同时考虑多基因模型和稀疏模型;并且通过将外部基因组数据纳入先验中,它们可以提高检验效能并产生新的生物学见解。然而,这些方法需要获取个体基因型和表型,而这些数据往往不易获得。在此,我们提供了一个无需个体水平数据即可进行这些分析的框架。具体而言,我们引入了一种“基于汇总统计量的回归”(RSS)似然函数,它将多元回归系数与通常容易获得的单变量回归结果联系起来。RSS似然函数需要协变量(单核苷酸多态性,SNPs)之间相关性的估计值,这些估计值也可以从公共数据库中获得。我们通过将RSS似然函数与先前提出的先验分布相结合来进行贝叶斯多元回归分析,利用马尔可夫链蒙特卡罗方法对后验分布进行采样。在广泛的模拟中,无论是估计遗传力还是检测关联,RSS的表现都与使用个体数据进行的分析相似。我们将RSS应用于一项包含253,288名个体、106万个SNPs分型的人类身高GWAS研究,对于该研究,分析个体水平数据实际上是不可能的。遗传力估计值(52%)与使用这些数据子集的先前结果一致,但更为精确。我们还在分析中识别出许多先前未报道的与身高相关的位点。软件可在https://github.com/stephenslab/rss获取。

相似文献

1
BAYESIAN LARGE-SCALE MULTIPLE REGRESSION WITH SUMMARY STATISTICS FROM GENOME-WIDE ASSOCIATION STUDIES.
Ann Appl Stat. 2017;11(3):1561-1592. doi: 10.1214/17-aoas1046. Epub 2017 Oct 5.
2
Bayesian multiple logistic regression for case-control GWAS.
PLoS Genet. 2018 Dec 31;14(12):e1007856. doi: 10.1371/journal.pgen.1007856. eCollection 2018 Dec.
3
Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.
Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.
4
Bayesian meta-analysis models for cross cancer genomic investigation of pleiotropic effects using group structure.
Stat Med. 2021 Mar 15;40(6):1498-1518. doi: 10.1002/sim.8855. Epub 2020 Dec 27.
7
A fast algorithm for Bayesian multi-locus model in genome-wide association studies.
Mol Genet Genomics. 2017 Aug;292(4):923-934. doi: 10.1007/s00438-017-1322-4. Epub 2017 May 22.
9
Bayesian methods applied to GWAS.
Methods Mol Biol. 2013;1019:237-74. doi: 10.1007/978-1-62703-447-0_10.

引用本文的文献

3
Mapping disease loci to biological processes via joint pleiotropic and epigenomic partitioning.
medRxiv. 2025 May 6:2025.05.05.25327017. doi: 10.1101/2025.05.05.25327017.
6
Specificity, length, and luck: How genes are prioritized by rare and common variant association studies.
bioRxiv. 2024 Dec 16:2024.12.12.628073. doi: 10.1101/2024.12.12.628073.
10
Optimizing and benchmarking polygenic risk scores with GWAS summary statistics.
Genome Biol. 2024 Oct 8;25(1):260. doi: 10.1186/s13059-024-03400-w.

本文引用的文献

1
False discovery rates: a new deal.
Biostatistics. 2017 Apr 1;18(2):275-294. doi: 10.1093/biostatistics/kxw041.
3
JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects.
Genet Epidemiol. 2016 Apr;40(3):188-201. doi: 10.1002/gepi.21953.
5
Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.
Am J Hum Genet. 2015 Oct 1;97(4):576-92. doi: 10.1016/j.ajhg.2015.09.001.
6
Partitioning heritability by functional annotation using genome-wide association summary statistics.
Nat Genet. 2015 Nov;47(11):1228-35. doi: 10.1038/ng.3404. Epub 2015 Sep 28.
9
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.
Genetics. 2015 Jul;200(3):719-36. doi: 10.1534/genetics.115.176107. Epub 2015 May 6.
10
The BioMart community portal: an innovative alternative to large, centralized data repositories.
Nucleic Acids Res. 2015 Jul 1;43(W1):W589-98. doi: 10.1093/nar/gkv350. Epub 2015 Apr 20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验