Duan Weiwei, Zhao Yang, Wei Yongyue, Yang Sheng, Bai Jianling, Shen Sipeng, Du Mulong, Huang Lihong, Hu Zhibin, Chen Feng
Department of Biostatistics, School of Public Health, Nanjing Medical University, 101 Longmian Road, Nanjing, Jiangsu, China.
The Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, Nanjing Medical University, Nanjing, China.
Mol Genet Genomics. 2017 Aug;292(4):923-934. doi: 10.1007/s00438-017-1322-4. Epub 2017 May 22.
Genome-wide association studies (GWAS) have identified a large amount of single-nucleotide polymorphisms (SNPs) associated with complex traits. A recently developed linear mixed model for estimating heritability by simultaneously fitting all SNPs suggests that common variants can explain a substantial fraction of heritability, which hints at the low power of single variant analysis typically used in GWAS. Consequently, many multi-locus shrinkage models have been proposed under a Bayesian framework. However, most use Markov Chain Monte Carlo (MCMC) algorithm, which are time-consuming and challenging to apply to GWAS data. Here, we propose a fast algorithm of Bayesian adaptive lasso using variational inference (BAL-VI). Extensive simulations and real data analysis indicate that our model outperforms the well-known Bayesian lasso and Bayesian adaptive lasso models in accuracy and speed. BAL-VI can complete a simultaneous analysis of a lung cancer GWAS data with ~3400 subjects and ~570,000 SNPs in about half a day.
全基因组关联研究(GWAS)已经鉴定出大量与复杂性状相关的单核苷酸多态性(SNP)。最近开发的一种通过同时拟合所有SNP来估计遗传力的线性混合模型表明,常见变异可以解释相当一部分遗传力,这暗示了GWAS中通常使用的单变异分析功效较低。因此,在贝叶斯框架下提出了许多多位点收缩模型。然而,大多数模型使用马尔可夫链蒙特卡罗(MCMC)算法,该算法耗时且难以应用于GWAS数据。在此,我们提出一种使用变分推断的贝叶斯自适应套索快速算法(BAL-VI)。大量模拟和实际数据分析表明,我们的模型在准确性和速度方面优于著名的贝叶斯套索和贝叶斯自适应套索模型。BAL-VI大约半天就能完成对一个包含约3400名受试者和约570,000个SNP的肺癌GWAS数据的同时分析。