高效的贝叶斯混合模型分析提高了大型队列研究中的关联效能。

Efficient Bayesian mixed-model analysis increases association power in large cohorts.

作者信息

Loh Po-Ru, Tucker George, Bulik-Sullivan Brendan K, Vilhjálmsson Bjarni J, Finucane Hilary K, Salem Rany M, Chasman Daniel I, Ridker Paul M, Neale Benjamin M, Berger Bonnie, Patterson Nick, Price Alkes L

机构信息

1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. [3] Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA.

出版信息

Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.

DOI:10.1038/ng.3190

PMID:25642633

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4342297/

Abstract

Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.

摘要

线性混合模型是用于识别基因关联和避免混杂因素的强大统计工具。然而，现有方法在大型队列中计算上难以处理，并且可能无法优化检验效能。所有现有方法都需要时间成本O(MN(2))（其中N是样本数量，M是单核苷酸多态性（SNP）数量），并且隐含地假设一种无穷小的遗传结构，即效应大小呈正态分布，这可能会限制检验效能。在此，我们提出一种效率更高的混合模型关联方法BOLT-LMM，它仅需要少量的O(MN)时间迭代，并通过对标记效应大小采用贝叶斯混合先验来对更现实的、非无穷小的遗传结构进行建模，从而提高检验效能。我们将BOLT-LMM应用于妇女基因组健康研究（WGHS）的23294个样本中的9个数量性状，并观察到检验效能显著提高，这与模拟结果一致。理论和模拟表明，检验效能的提升随着队列规模的增加而增加，这使得BOLT-LMM在大型队列的全基因组关联研究中具有吸引力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f6b/4342297/6480a9111787/nihms650284f1.jpg

相似文献

Efficient Bayesian mixed-model analysis increases association power in large cohorts.高效的贝叶斯混合模型分析提高了大型队列研究中的关联效能。

Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.

An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations.一种在结构群体中进行全基因组关联研究的高效多基因混合模型方法。

Nat Genet. 2012 Jun 17;44(7):825-30. doi: 10.1038/ng.2314.

Decomposing genomic variance using information from GWA, GWE and eQTL analysis.利用全基因组关联研究（GWA）、全基因组表达研究（GWE）和表达定量性状位点（eQTL）分析的信息分解基因组变异。

Anim Genet. 2016 Apr;47(2):165-73. doi: 10.1111/age.12396. Epub 2015 Dec 17.

Inference on the Genetic Basis of Eye and Skin Color in an Admixed Population via Bayesian Linear Mixed Models.通过贝叶斯线性混合模型推断混合人群中眼睛和皮肤颜色的遗传基础。

Genetics. 2017 Jun;206(2):1113-1126. doi: 10.1534/genetics.116.193383. Epub 2017 Apr 4.

Sensitivity to prior specification in Bayesian genome-based prediction models.基于贝叶斯基因组的预测模型中对先验设定的敏感性。

Stat Appl Genet Mol Biol. 2013 Jun;12(3):375-91. doi: 10.1515/sagmb-2012-0042.

Bayesian model comparison in genetic association analysis: linear mixed modeling and SNP set testing.遗传关联分析中的贝叶斯模型比较：线性混合模型与单核苷酸多态性集检验

Biostatistics. 2015 Oct;16(4):701-12. doi: 10.1093/biostatistics/kxv009. Epub 2015 Mar 21.

Assessing the power of tag SNPs in the mapping of quantitative trait loci (QTL) with extremal and random samples.利用极端样本和随机样本评估标签单核苷酸多态性在数量性状基因座（QTL）定位中的效能。

BMC Genet. 2005 Oct 19;6:51. doi: 10.1186/1471-2156-6-51.

Efficient multiple-trait association and estimation of genetic correlation using the matrix-variate linear mixed model.使用矩阵变量线性混合模型进行高效多性状关联分析和遗传相关性估计。

Genetics. 2015 May;200(1):59-68. doi: 10.1534/genetics.114.171447. Epub 2015 Feb 27.

An efficient empirical Bayes method for genomewide association studies.一种用于全基因组关联研究的高效经验贝叶斯方法。

J Anim Breed Genet. 2016 Aug;133(4):253-63. doi: 10.1111/jbg.12191. Epub 2015 Nov 19.

A variance components factor model for genetic association studies: a Bayesian analysis.用于遗传关联研究的方差分量因子模型：贝叶斯分析。

Genet Epidemiol. 2010 Sep;34(6):529-36. doi: 10.1002/gepi.20503.

引用本文的文献

The causal effects of remnant cholesterol on increased risk of cardiovascular diseases in East Asians.残余胆固醇对东亚人心血管疾病风险增加的因果效应。

BMC Med. 2025 Aug 26;23(1):495. doi: 10.1186/s12916-025-04329-y.

Revisiting the Trans-Ancestry Genetic Correlation of Refractive Error.重新审视屈光不正的跨祖先遗传相关性。

Invest Ophthalmol Vis Sci. 2025 Aug 1;66(11):60. doi: 10.1167/iovs.66.11.60.

Single-cell DNA methylome and 3D genome atlas of human subcutaneous adipose tissue.人类皮下脂肪组织的单细胞DNA甲基化组和三维基因组图谱

Nat Genet. 2025 Aug 20. doi: 10.1038/s41588-025-02300-4.

Investigating the genetic relationship of intracranial and subcortical brain volumes with depression and other psychiatric disorders.研究颅内和皮质下脑容量与抑郁症及其他精神疾病之间的遗传关系。

Imaging Neurosci (Camb). 2024 Sep 19;2. doi: 10.1162/imag_a_00291. eCollection 2024.

Genetic architecture of bone marrow fat fraction implies its involvement in osteoporosis risk.骨髓脂肪分数的遗传结构表明其与骨质疏松症风险有关。

Nat Commun. 2025 Aug 12;16(1):7490. doi: 10.1038/s41467-025-62826-3.

LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes.LDAK-KVIK对定量和二元表型进行快速且强大的混合模型关联分析。

Nat Genet. 2025 Aug 11. doi: 10.1038/s41588-025-02286-z.

Variational autoencoder-based model improves polygenic prediction in blood cell traits.基于变分自编码器的模型改进了血细胞性状的多基因预测。

HGG Adv. 2025 Aug 8;6(4):100490. doi: 10.1016/j.xhgg.2025.100490.

Patterns and drivers of 43,617 mosaic chromosomal alterations in blood.血液中43617种镶嵌染色体改变的模式与驱动因素

medRxiv. 2025 Jul 30:2025.07.30.25332451. doi: 10.1101/2025.07.30.25332451.

Large-scale genome-wide analyses with proteomics integration reveal novel loci and biological insights into frailty.结合蛋白质组学的大规模全基因组分析揭示了与身体虚弱相关的新基因座和生物学见解。

Nat Aging. 2025 Aug;5(8):1589-1600. doi: 10.1038/s43587-025-00925-y. Epub 2025 Aug 5.

Cerebrospinal fluid haptoglobin levels and outcome after aneurysmal subarachnoid haemorrhage: Evidence from Mendelian randomization.动脉瘤性蛛网膜下腔出血后脑脊液触珠蛋白水平与预后：孟德尔随机化研究证据

PLoS One. 2025 Aug 5;20(8):e0329287. doi: 10.1371/journal.pone.0329287. eCollection 2025.

本文引用的文献

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.LD评分回归在全基因组关联研究中区分混杂因素与多基因性。

Nat Genet. 2015 Mar;47(3):291-5. doi: 10.1038/ng.3211. Epub 2015 Feb 2.

MultiBLUP: improved SNP-based prediction for complex traits.MultiBLUP：基于单核苷酸多态性（SNP）的复杂性状预测方法的改进

Genome Res. 2014 Sep;24(9):1550-7. doi: 10.1101/gr.169375.113. Epub 2014 Jun 24.

Improving the power of GWAS and avoiding confounding from population stratification with PC-Select.利用PC-Select提高全基因组关联研究的效能并避免群体分层带来的混杂效应。

Genetics. 2014 Jul;197(3):1045-9. doi: 10.1534/genetics.114.164285. Epub 2014 Apr 29.

Advantages and pitfalls in the application of mixed-model association methods.混合模型关联方法应用的优缺点。

Nat Genet. 2014 Feb;46(2):100-6. doi: 10.1038/ng.2876.

Employing a Monte Carlo algorithm in Newton-type methods for restricted maximum likelihood estimation of genetic parameters.在用于遗传参数限制最大似然估计的牛顿型方法中采用蒙特卡罗算法。

PLoS One. 2013 Dec 10;8(12):e80821. doi: 10.1371/journal.pone.0080821. eCollection 2013.

Pitfalls of predicting complex traits from SNPs.从单核苷酸多态性预测复杂性状的陷阱。

Nat Rev Genet. 2013 Jul;14(7):507-15. doi: 10.1038/nrg3457.

The benefits of selecting phenotype-specific variants for applications of mixed models in genomics.选择表型特异性变体用于基因组学中混合模型应用的优势。

Sci Rep. 2013;3:1815. doi: 10.1038/srep01815.

Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits.无意义突变在 LGR4 基因中与几种人类疾病和其他特征有关。

Nature. 2013 May 23;497(7450):517-20. doi: 10.1038/nature12124. Epub 2013 May 5.

MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals.MASTOR：用于相关个体样本中定量性状混合模型关联作图的方法。

Am J Hum Genet. 2013 May 2;92(5):652-66. doi: 10.1016/j.ajhg.2013.03.014.

FaST-LMM-Select for addressing confounding from spatial structure and rare variants.用于解决空间结构和罕见变异混杂问题的快速线性混合模型选择方法（FaST-LMM-Select）。

Nat Genet. 2013 May;45(5):470-1. doi: 10.1038/ng.2620.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

高效的贝叶斯混合模型分析提高了大型队列研究中的关联效能。

Efficient Bayesian mixed-model analysis increases association power in large cohorts.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献