一种新的基于贝叶斯模型平均的全基因组关联研究的变分贝叶斯多基因 Z 统计量。

A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging.

机构信息

Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA 98195, USA.

出版信息

Bioinformatics. 2012 Jul 1;28(13):1738-44. doi: 10.1093/bioinformatics/bts261. Epub 2012 May 4.

DOI:10.1093/bioinformatics/bts261

PMID:22563072

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3381972/

Abstract

MOTIVATION

For many complex traits, including height, the majority of variants identified by genome-wide association studies (GWAS) have small effects, leaving a significant proportion of the heritable variation unexplained. Although many penalized multiple regression methodologies have been proposed to increase the power to detect associations for complex genetic architectures, they generally lack mechanisms for false-positive control and diagnostics for model over-fitting. Our methodology is the first penalized multiple regression approach that explicitly controls Type I error rates and provide model over-fitting diagnostics through a novel normally distributed statistic defined for every marker within the GWAS, based on results from a variational Bayes spike regression algorithm.

RESULTS

We compare the performance of our method to the lasso and single marker analysis on simulated data and demonstrate that our approach has superior performance in terms of power and Type I error control. In addition, using the Women's Health Initiative (WHI) SNP Health Association Resource (SHARe) GWAS of African-Americans, we show that our method has power to detect additional novel associations with body height. These findings replicate by reaching a stringent cutoff of marginal association in a larger cohort.

AVAILABILITY

An R-package, including an implementation of our variational Bayes spike regression (vBsr) algorithm, is available at http://kooperberg.fhcrc.org/soft.html.

摘要

动机

对于许多复杂特征，包括身高，全基因组关联研究（GWAS）确定的大多数变体具有较小的影响，这使得可遗传变异的很大一部分仍未得到解释。尽管已经提出了许多惩罚性多重回归方法来提高检测复杂遗传结构关联的能力，但它们通常缺乏控制假阳性和模型过拟合的机制。我们的方法是第一个明确控制 I 型错误率的惩罚性多重回归方法，并通过基于变分贝叶斯尖峰回归算法为 GWAS 中的每个标记定义的新正态分布统计量提供模型过拟合诊断。

结果

我们将我们的方法与套索和单标记分析在模拟数据上进行了比较，并证明我们的方法在功效和 I 型错误控制方面具有优越的性能。此外，使用妇女健康倡议（WHI）SNP 健康关联资源（SHARe）GWAS 对非裔美国人进行分析，我们表明我们的方法具有检测身体高度的附加新颖关联的能力。这些发现通过在更大的队列中达到边际关联的严格截止值得到了复制。

可用性

包括我们的变分贝叶斯尖峰回归（vBsr）算法实现的 R 包可在 http://kooperberg.fhcrc.org/soft.html 获得。

相似文献

A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging.一种新的基于贝叶斯模型平均的全基因组关联研究的变分贝叶斯多基因 Z 统计量。

Bioinformatics. 2012 Jul 1;28(13):1738-44. doi: 10.1093/bioinformatics/bts261. Epub 2012 May 4.

Genome-wide association study of body height in African Americans: the Women's Health Initiative SNP Health Association Resource (SHARe).全基因组关联研究在非裔美国人的身高：妇女健康倡议 SNP 健康协会资源 (SHARe)。

Hum Mol Genet. 2012 Feb 1;21(3):711-20. doi: 10.1093/hmg/ddr489. Epub 2011 Oct 21.

A fast algorithm for Bayesian multi-locus model in genome-wide association studies.全基因组关联研究中贝叶斯多位点模型的快速算法。

Mol Genet Genomics. 2017 Aug;292(4):923-934. doi: 10.1007/s00438-017-1322-4. Epub 2017 May 22.

How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?基于汇总数据的方法在不同遗传结构下识别表达性状关联的能力有多强？

Pac Symp Biocomput. 2018;23:228-239.

Discovery and fine-mapping of kidney function loci in first genome-wide association study in Africans.首个在非裔人群中进行的全基因组关联研究发现并精细定位了肾功能相关位点。

Hum Mol Genet. 2021 Jul 28;30(16):1559-1568. doi: 10.1093/hmg/ddab088.

GWASinlps: non-local prior based iterative SNP selection tool for genome-wide association studies.GWASinlps：基于非局部先验的全基因组关联研究的迭代 SNP 选择工具。

Bioinformatics. 2019 Jan 1;35(1):1-11. doi: 10.1093/bioinformatics/bty472.

A variational Bayes discrete mixture test for rare variant association.一种用于罕见变异关联的变分贝叶斯离散混合检验。

Genet Epidemiol. 2014 Jan;38(1):21-30. doi: 10.1002/gepi.21772.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

Using the Bayesian variational spike and slab model in a genome-wide association study for finding associated loci with bipolar disorder.使用贝叶斯变分尖峰和板模型进行全基因组关联研究，以寻找与双相情感障碍相关的基因座。

Ann Hum Genet. 2024 May;88(3):212-246. doi: 10.1111/ahg.12538. Epub 2023 Dec 31.

Covariate-modulated local false discovery rate for genome-wide association studies.基于协变量的全基因组关联研究的局部假发现率。

Bioinformatics. 2014 Aug 1;30(15):2098-104. doi: 10.1093/bioinformatics/btu145. Epub 2014 Apr 7.

引用本文的文献

Bayesian Machine Learning Enables Identification of Transcriptional Network Disruptions Associated with Drug-Resistant Prostate Cancer.贝叶斯机器学习可识别与耐药性前列腺癌相关的转录网络紊乱。

Cancer Res. 2023 Apr 14;83(8):1361-1380. doi: 10.1158/0008-5472.CAN-22-1910.

Multi-tissue neocortical transcriptome-wide association study implicates 8 genes across 6 genomic loci in Alzheimer's disease.多组织新皮层全转录组关联研究提示 6 个基因组位点的 8 个基因与阿尔茨海默病有关。

Genome Med. 2021 May 4;13(1):76. doi: 10.1186/s13073-021-00890-2.

Bayesian variable selection for parametric survival model with applications to cancer omics data.贝叶斯参数生存模型变量选择及其在癌症组学数据中的应用。

Hum Genomics. 2018 Nov 6;12(1):49. doi: 10.1186/s40246-018-0179-x.

An approximate Bayesian significance test for genomic evaluations.一种用于基因组评估的近似贝叶斯显著性检验。

Biom J. 2018 Nov;60(6):1096-1109. doi: 10.1002/bimj.201700219. Epub 2018 Aug 12.

A fast algorithm for Bayesian multi-locus model in genome-wide association studies.全基因组关联研究中贝叶斯多位点模型的快速算法。

Mol Genet Genomics. 2017 Aug;292(4):923-934. doi: 10.1007/s00438-017-1322-4. Epub 2017 May 22.

Multiple SNP Set Analysis for Genome-Wide Association Studies Through Bayesian Latent Variable Selection.通过贝叶斯潜在变量选择进行全基因组关联研究的多单核苷酸多态性集分析

Genet Epidemiol. 2015 Dec;39(8):664-77. doi: 10.1002/gepi.21932. Epub 2015 Oct 30.

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.连锁不平衡建模提高了多基因风险评分的准确性。

Am J Hum Genet. 2015 Oct 1;97(4):576-92. doi: 10.1016/j.ajhg.2015.09.001.

SPINE: SParse eIgengene NEtwork linking gene expression clusters in Dehalococcoides mccartyi to perturbations in experimental conditions.脊柱：将嗜麦草栖甲烷八叠球菌中的基因表达簇与实验条件下的扰动联系起来的稀疏特征基因网络。

PLoS One. 2015 Feb 25;10(2):e0118404. doi: 10.1371/journal.pone.0118404. eCollection 2015.

Efficient Bayesian mixed-model analysis increases association power in large cohorts.高效的贝叶斯混合模型分析提高了大型队列研究中的关联效能。

Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.

Sparse expression bases in cancer reveal tumor drivers.癌症中的稀疏表达基础揭示肿瘤驱动因素。

Nucleic Acids Res. 2015 Feb 18;43(3):1332-44. doi: 10.1093/nar/gku1290. Epub 2015 Jan 12.

本文引用的文献

Hum Mol Genet. 2012 Feb 1;21(3):711-20. doi: 10.1093/hmg/ddr489. Epub 2011 Oct 21.

Beyond missing heritability: prediction of complex traits.超越遗传缺失：复杂性状的预测。

PLoS Genet. 2011 Apr;7(4):e1002051. doi: 10.1371/journal.pgen.1002051. Epub 2011 Apr 28.

The Bayesian lasso for genome-wide association studies.贝叶斯套索在全基因组关联研究中的应用。

Bioinformatics. 2011 Feb 15;27(4):516-23. doi: 10.1093/bioinformatics/btq688. Epub 2010 Dec 14.

A variable selection method for genome-wide association studies.一种全基因组关联研究的变量选择方法。

Bioinformatics. 2011 Jan 1;27(1):1-8. doi: 10.1093/bioinformatics/btq600. Epub 2010 Oct 29.

Hundreds of variants clustered in genomic loci and biological pathways affect human height.数以百计的变异体聚集在基因组位置和生物途径中，影响人类身高。

Nature. 2010 Oct 14;467(7317):832-8. doi: 10.1038/nature09410. Epub 2010 Sep 29.

Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径

J Stat Softw. 2010;33(1):1-22.

Hints of hidden heritability in GWAS.GWAS 中隐藏遗传力的迹象。

Nat Genet. 2010 Jul;42(7):558-60. doi: 10.1038/ng0710-558.

Common SNPs explain a large proportion of the heritability for human height.常见的单核苷酸多态性解释了人类身高遗传的很大一部分。

Nat Genet. 2010 Jul;42(7):565-9. doi: 10.1038/ng.608. Epub 2010 Jun 20.

New approaches to population stratification in genome-wide association studies.全基因组关联研究中群体分层的新方法。

Nat Rev Genet. 2010 Jul;11(7):459-63. doi: 10.1038/nrg2813.

Missing heritability and strategies for finding the underlying causes of complex disease.复杂疾病遗传率缺失及其潜在病因的研究策略。

Nat Rev Genet. 2010 Jun;11(6):446-50. doi: 10.1038/nrg2809.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验