Pereira Miguel, Thompson John R, Weichenberger Christian X, Thomas Duncan C, Minelli Cosetta
National Heart and Lung Institute, Imperial College London, London, United Kingdom.
Department of Health Sciences, University of Leicester, Leicester, United Kingdom.
Genet Epidemiol. 2017 May;41(4):320-331. doi: 10.1002/gepi.22038. Epub 2017 Apr 10.
With the aim of improving detection of novel single-nucleotide polymorphisms (SNPs) in genetic association studies, we propose a method of including prior biological information in a Bayesian shrinkage model that jointly estimates SNP effects. We assume that the SNP effects follow a normal distribution centered at zero with variance controlled by a shrinkage hyperparameter. We use biological information to define the amount of shrinkage applied on the SNP effects distribution, so that the effects of SNPs with more biological support are less shrunk toward zero, thus being more likely detected. The performance of the method was tested in a simulation study (1,000 datasets, 500 subjects with ∼200 SNPs in 10 linkage disequilibrium (LD) blocks) using a continuous and a binary outcome. It was further tested in an empirical example on body mass index (continuous) and overweight (binary) in a dataset of 1,829 subjects and 2,614 SNPs from 30 blocks. Biological knowledge was retrieved using the bioinformatics tool Dintor, which queried various databases. The joint Bayesian model with inclusion of prior information outperformed the standard analysis: in the simulation study, the mean ranking of the true LD block was 2.8 for the Bayesian model versus 3.6 for the standard analysis of individual SNPs; in the empirical example, the mean ranking of the six true blocks was 8.5 versus 9.3 in the standard analysis. These results suggest that our method is more powerful than the standard analysis. We expect its performance to improve further as more biological information about SNPs becomes available.
为了提高基因关联研究中新型单核苷酸多态性(SNP)的检测能力,我们提出了一种在贝叶斯收缩模型中纳入先验生物学信息的方法,该模型可联合估计SNP效应。我们假设SNP效应服从以零为中心的正态分布,其方差由收缩超参数控制。我们利用生物学信息来定义应用于SNP效应分布的收缩量,这样,获得更多生物学支持的SNP效应向零收缩的程度较小,因此更有可能被检测到。在一项模拟研究(1000个数据集,500名受试者,10个连锁不平衡(LD)区域中有约200个SNP)中,使用连续型和二分类结局对该方法的性能进行了测试。在一个包含1829名受试者和来自30个区域的2614个SNP的数据集上,以体重指数(连续型)和超重(二分类)为例进行了实证检验。使用生物信息学工具Dintor检索生物学知识,该工具查询了各种数据库。纳入先验信息的联合贝叶斯模型优于标准分析:在模拟研究中,对于贝叶斯模型,真实LD区域的平均排名为2.8,而对单个SNP进行标准分析时为3.6;在实证检验中,六个真实区域的平均排名在标准分析中为9.3,而在贝叶斯模型中为8.5。这些结果表明,我们的方法比标准分析更具效力。我们预计,随着更多关于SNP的生物学信息可用,其性能将进一步提高。