贝叶斯套索在全基因组关联研究中的应用。

The Bayesian lasso for genome-wide association studies.

机构信息

Department of Statistics, Pennsylvania State University, State College, PA 16802, USA.

出版信息

Bioinformatics. 2011 Feb 15;27(4):516-23. doi: 10.1093/bioinformatics/btq688. Epub 2010 Dec 14.

DOI:10.1093/bioinformatics/btq688

PMID:21156729

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3105480/

Abstract

MOTIVATION

Despite their success in identifying genes that affect complex disease or traits, current genome-wide association studies (GWASs) based on a single SNP analysis are too simple to elucidate a comprehensive picture of the genetic architecture of phenotypes. A simultaneous analysis of a large number of SNPs, although statistically challenging, especially with a small number of samples, is crucial for genetic modeling.

METHOD

We propose a two-stage procedure for multi-SNP modeling and analysis in GWASs, by first producing a 'preconditioned' response variable using a supervised principle component analysis and then formulating Bayesian lasso to select a subset of significant SNPs. The Bayesian lasso is implemented with a hierarchical model, in which scale mixtures of normal are used as prior distributions for the genetic effects and exponential priors are considered for their variances, and then solved by using the Markov chain Monte Carlo (MCMC) algorithm. Our approach obviates the choice of the lasso parameter by imposing a diffuse hyperprior on it and estimating it along with other parameters and is particularly powerful for selecting the most relevant SNPs for GWASs, where the number of predictors exceeds the number of observations.

RESULTS

The new approach was examined through a simulation study. By using the approach to analyze a real dataset from the Framingham Heart Study, we detected several significant genes that are associated with body mass index (BMI). Our findings support the previous results about BMI-related SNPs and, meanwhile, gain new insights into the genetic control of this trait.

AVAILABILITY

The computer code for the approach developed is available at Penn State Center for Statistical Genetics web site, http://statgen.psu.edu.

摘要

动机

尽管基于单核苷酸多态性（SNP）分析的全基因组关联研究（GWAS）在识别影响复杂疾病或性状的基因方面取得了成功，但它们过于简单，无法阐明表型遗传结构的全貌。尽管统计上具有挑战性，尤其是在样本数量较少的情况下，同时分析大量 SNP 对于遗传建模至关重要。

方法

我们提出了一种用于 GWAS 中多 SNP 建模和分析的两阶段程序，首先使用有监督的主成分分析生成“预处理”响应变量，然后制定贝叶斯套索选择一组重要的 SNP。贝叶斯套索使用分层模型实现，其中正态分布的混合尺度用作遗传效应的先验分布，并且考虑了它们的方差的指数先验，然后使用马尔可夫链蒙特卡罗（MCMC）算法进行求解。我们的方法通过对其施加扩散超先验来避免套索参数的选择，并与其他参数一起对其进行估计，对于选择 GWAS 中最相关的 SNP 特别有效，其中预测因子的数量超过了观测值的数量。

结果

通过模拟研究检验了新方法。通过使用该方法分析来自弗雷明汉心脏研究的真实数据集，我们检测到了几个与体重指数（BMI）相关的显着基因。我们的发现支持了之前关于 BMI 相关 SNP 的结果，同时深入了解了该性状的遗传控制。

可用性

开发的方法的计算机代码可在宾夕法尼亚州立大学统计遗传学中心网站上获得，网址为 http://statgen.psu.edu。

相似文献

The Bayesian lasso for genome-wide association studies.贝叶斯套索在全基因组关联研究中的应用。

Bioinformatics. 2011 Feb 15;27(4):516-23. doi: 10.1093/bioinformatics/btq688. Epub 2010 Dec 14.

A fast algorithm for Bayesian multi-locus model in genome-wide association studies.全基因组关联研究中贝叶斯多位点模型的快速算法。

Mol Genet Genomics. 2017 Aug;292(4):923-934. doi: 10.1007/s00438-017-1322-4. Epub 2017 May 22.

Mixture SNPs effect on phenotype in genome-wide association studies.全基因组关联研究中混合单核苷酸多态性对表型的影响。

BMC Genomics. 2015 Feb 3;16(1):3. doi: 10.1186/1471-2164-16-3.

A Bayesian model for detection of high-order interactions among genetic variants in genome-wide association studies.一种用于在全基因组关联研究中检测基因变异间高阶相互作用的贝叶斯模型。

BMC Genomics. 2015 Nov 25;16:1011. doi: 10.1186/s12864-015-2217-6.

Structured Genome-Wide Association Studies with Bayesian Hierarchical Variable Selection.基于贝叶斯分层变量选择的结构全基因组关联研究。

Genetics. 2019 Jun;212(2):397-415. doi: 10.1534/genetics.119.301906. Epub 2019 Apr 22.

Bayesian LASSO for quantitative trait loci mapping.用于数量性状基因座定位的贝叶斯套索法

Genetics. 2008 Jun;179(2):1045-55. doi: 10.1534/genetics.107.085589. Epub 2008 May 27.

A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits.用于基于基因组的复杂性状分析与预测的多性状贝叶斯套索法

Genetics. 2020 Feb;214(2):305-331. doi: 10.1534/genetics.119.302934. Epub 2019 Dec 26.

Genome-wide prediction using Bayesian additive regression trees.使用贝叶斯加法回归树进行全基因组预测。

Genet Sel Evol. 2016 Jun 10;48(1):42. doi: 10.1186/s12711-016-0219-8.

Inference from genome-wide association studies using a novel Markov model.使用新型马尔可夫模型进行全基因组关联研究的推断。

Genet Epidemiol. 2008 Sep;32(6):497-504. doi: 10.1002/gepi.20322.

Multiple SNP Set Analysis for Genome-Wide Association Studies Through Bayesian Latent Variable Selection.通过贝叶斯潜在变量选择进行全基因组关联研究的多单核苷酸多态性集分析

Genet Epidemiol. 2015 Dec;39(8):664-77. doi: 10.1002/gepi.21932. Epub 2015 Oct 30.

引用本文的文献

EBMGP: a deep learning model for genomic prediction based on Elastic Net feature selection and bidirectional encoder representations from transformer's embedding and multi-head attention pooling.EBMGP：一种基于弹性网络特征选择以及来自Transformer嵌入和多头注意力池化的双向编码器表示的基因组预测深度学习模型。

Theor Appl Genet. 2025 Apr 19;138(5):103. doi: 10.1007/s00122-025-04894-z.

BGWAS: Bayesian variable selection in linear mixed models with nonlocal priors for genome-wide association studies.贝叶斯全局关联研究：用于全基因组关联研究的具有非局部先验的线性混合模型中的贝叶斯变量选择。

BMC Bioinformatics. 2023 May 11;24(1):194. doi: 10.1186/s12859-023-05316-x.

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.高效惩罚广义线性混合模型在高维数据中的变量选择和遗传风险预测。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad063.

A Poisson reduced-rank regression model for association mapping in sequencing data.用于测序数据关联映射的泊松降秩回归模型。

BMC Bioinformatics. 2022 Dec 8;23(1):529. doi: 10.1186/s12859-022-05054-6.

Potential application of elastic nets for shared polygenicity detection with adapted threshold selection.弹性网络在具有自适应阈值选择的共享多基因性检测中的潜在应用。

Int J Biostat. 2022 Nov 3;19(2):417-438. doi: 10.1515/ijb-2020-0108. eCollection 2023 Nov 1.

Discovering candidate SNPs for resilience breeding of red clover.发现红三叶草抗性育种的候选单核苷酸多态性

Front Plant Sci. 2022 Sep 28;13:997860. doi: 10.3389/fpls.2022.997860. eCollection 2022.

Genome-Wide Association Study Statistical Models: A Review.全基因组关联研究统计模型：综述。

Methods Mol Biol. 2022;2481:43-62. doi: 10.1007/978-1-0716-2237-7_4.

ordinalbayes: Fitting Ordinal Bayesian Regression Models to High-Dimensional Data Using R.有序贝叶斯：使用R语言对高维数据拟合有序贝叶斯回归模型

Stats (Basel). 2022 Jun;5(2):371-384. doi: 10.3390/stats5020021. Epub 2022 Apr 15.

Bayesian regularization for a nonstationary Gaussian linear mixed effects model.贝叶斯正则化在非平稳高斯线性混合效应模型中的应用。

Stat Med. 2022 Feb 20;41(4):681-697. doi: 10.1002/sim.9279. Epub 2021 Dec 12.

Bayesian variable selection for high-dimensional data with an ordinal response: identifying genes associated with prognostic risk group in acute myeloid leukemia.贝叶斯变量选择在高维数据与有序响应：鉴定基因与预后风险组相关的急性髓系白血病。

BMC Bioinformatics. 2021 Nov 2;22(1):539. doi: 10.1186/s12859-021-04432-w.

本文引用的文献

Common SNPs explain a large proportion of the heritability for human height.常见的单核苷酸多态性解释了人类身高遗传的很大一部分。

Nat Genet. 2010 Jul;42(7):565-9. doi: 10.1038/ng.608. Epub 2010 Jun 20.

A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis.一种快速准确的多基因座全基因组关联分析的变分贝叶斯算法。

BMC Bioinformatics. 2010 Jan 27;11:58. doi: 10.1186/1471-2105-11-58.

Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论

J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.

Genome-wide association analysis by lasso penalized logistic regression.基于套索惩罚逻辑回归的全基因组关联分析。

Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.

Progress and challenges in genome-wide association studies in humans.人类全基因组关联研究的进展与挑战

Nature. 2008 Dec 11;456(7223):728-31. doi: 10.1038/nature07631.

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.全基因组和重测序关联研究中所有单核苷酸多态性的同步分析。

PLoS Genet. 2008 Jul 25;4(7):e1000130. doi: 10.1371/journal.pgen.1000130.

Bayesian LASSO for quantitative trait loci mapping.用于数量性状基因座定位的贝叶斯套索法

Genetics. 2008 Jun;179(2):1045-55. doi: 10.1534/genetics.107.085589. Epub 2008 May 27.

Genome-wide association studies for complex traits: consensus, uncertainty and challenges.复杂性状的全基因组关联研究：共识、不确定性与挑战。

Nat Rev Genet. 2008 May;9(5):356-69. doi: 10.1038/nrg2344.

The Framingham Heart Study, on its way to becoming the gold standard for Cardiovascular Genetic Epidemiology?弗雷明汉心脏研究，正朝着成为心血管遗传流行病学的金标准迈进？

BMC Med Genet. 2007 Oct 4;8:63. doi: 10.1186/1471-2350-8-63.

Bayesian mapping of genotype x expression interactions in quantitative and qualitative traits.数量性状和质量性状中基因型与表达相互作用的贝叶斯图谱分析。

Heredity (Edinb). 2006 Jul;97(1):4-18. doi: 10.1038/sj.hdy.6800817. Epub 2006 May 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验