Loh Po-Ru, Tucker George, Bulik-Sullivan Brendan K, Vilhjálmsson Bjarni J, Finucane Hilary K, Salem Rany M, Chasman Daniel I, Ridker Paul M, Neale Benjamin M, Berger Bonnie, Patterson Nick, Price Alkes L
1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.
1] Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. [2] Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. [3] Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA.
Nat Genet. 2015 Mar;47(3):284-90. doi: 10.1038/ng.3190. Epub 2015 Feb 2.
Linear mixed models are a powerful statistical tool for identifying genetic associations and avoiding confounding. However, existing methods are computationally intractable in large cohorts and may not optimize power. All existing methods require time cost O(MN(2)) (where N is the number of samples and M is the number of SNPs) and implicitly assume an infinitesimal genetic architecture in which effect sizes are normally distributed, which can limit power. Here we present a far more efficient mixed-model association method, BOLT-LMM, which requires only a small number of O(MN) time iterations and increases power by modeling more realistic, non-infinitesimal genetic architectures via a Bayesian mixture prior on marker effect sizes. We applied BOLT-LMM to 9 quantitative traits in 23,294 samples from the Women's Genome Health Study (WGHS) and observed significant increases in power, consistent with simulations. Theory and simulations show that the boost in power increases with cohort size, making BOLT-LMM appealing for genome-wide association studies in large cohorts.
线性混合模型是用于识别基因关联和避免混杂因素的强大统计工具。然而,现有方法在大型队列中计算上难以处理,并且可能无法优化检验效能。所有现有方法都需要时间成本O(MN(2))(其中N是样本数量,M是单核苷酸多态性(SNP)数量),并且隐含地假设一种无穷小的遗传结构,即效应大小呈正态分布,这可能会限制检验效能。在此,我们提出一种效率更高的混合模型关联方法BOLT-LMM,它仅需要少量的O(MN)时间迭代,并通过对标记效应大小采用贝叶斯混合先验来对更现实的、非无穷小的遗传结构进行建模,从而提高检验效能。我们将BOLT-LMM应用于妇女基因组健康研究(WGHS)的23294个样本中的9个数量性状,并观察到检验效能显著提高,这与模拟结果一致。理论和模拟表明,检验效能的提升随着队列规模的增加而增加,这使得BOLT-LMM在大型队列的全基因组关联研究中具有吸引力。