Spencer Amy V, Cox Angela, Lin Wei-Yu, Easton Douglas F, Michailidou Kyriaki, Walters Kevin
Advanced Analytics Centre, Global Medicines Development, AstraZeneca, Alderley Park, Macclesfield, United Kingdom.
School of Mathematics and Statistics, University of Sheffield, Sheffield, United Kingdom.
Genet Epidemiol. 2016 Apr;40(3):176-87. doi: 10.1002/gepi.21956. Epub 2016 Feb 1.
There is a large amount of functional genetic data available, which can be used to inform fine-mapping association studies (in diseases with well-characterised disease pathways). Single nucleotide polymorphism (SNP) prioritization via Bayes factors is attractive because prior information can inform the effect size or the prior probability of causal association. This approach requires the specification of the effect size. If the information needed to estimate a priori the probability density for the effect sizes for causal SNPs in a genomic region isn't consistent or isn't available, then specifying a prior variance for the effect sizes is challenging. We propose both an empirical method to estimate this prior variance, and a coherent approach to using SNP-level functional data, to inform the prior probability of causal association. Through simulation we show that when ranking SNPs by our empirical Bayes factor in a fine-mapping study, the causal SNP rank is generally as high or higher than the rank using Bayes factors with other plausible values of the prior variance. Importantly, we also show that assigning SNP-specific prior probabilities of association based on expert prior functional knowledge of the disease mechanism can lead to improved causal SNPs ranks compared to ranking with identical prior probabilities of association. We demonstrate the use of our methods by applying the methods to the fine mapping of the CASP8 region of chromosome 2 using genotype data from the Collaborative Oncological Gene-Environment Study (COGS) Consortium. The data we analysed included approximately 46,000 breast cancer case and 43,000 healthy control samples.
现有大量功能基因数据,可用于指导精细定位关联研究(针对疾病通路已明确的疾病)。通过贝叶斯因子进行单核苷酸多态性(SNP)优先级排序很有吸引力,因为先验信息可以为效应大小或因果关联的先验概率提供依据。这种方法需要指定效应大小。如果在基因组区域中先验估计因果SNP效应大小的概率密度所需的信息不一致或无法获得,那么指定效应大小的先验方差就具有挑战性。我们提出了一种估计此先验方差的经验方法,以及一种使用SNP水平功能数据的连贯方法,以提供因果关联的先验概率。通过模拟我们表明,在精细定位研究中按我们的经验贝叶斯因子对SNP进行排名时,因果SNP的排名通常与使用具有其他合理先验方差值的贝叶斯因子时的排名一样高或更高。重要的是,我们还表明,与使用相同关联先验概率进行排名相比,基于疾病机制的专家先验功能知识分配SNP特异性关联先验概率可导致因果SNP排名得到改善。我们通过将这些方法应用于使用协作肿瘤基因-环境研究(COGS)联盟的基因型数据对2号染色体的CASP8区域进行精细定位,展示了我们方法的应用。我们分析的数据包括约46,000例乳腺癌病例和43,000例健康对照样本。