将生物学知识纳入用于SNP效应联合估计的贝叶斯收缩模型。

Inclusion of biological knowledge in a Bayesian shrinkage model for joint estimation of SNP effects.

作者信息

Pereira Miguel, Thompson John R, Weichenberger Christian X, Thomas Duncan C, Minelli Cosetta

机构信息

National Heart and Lung Institute, Imperial College London, London, United Kingdom.

Department of Health Sciences, University of Leicester, Leicester, United Kingdom.

出版信息

Genet Epidemiol. 2017 May;41(4):320-331. doi: 10.1002/gepi.22038. Epub 2017 Apr 10.

DOI:10.1002/gepi.22038

PMID:28393391

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5397385/

Abstract

With the aim of improving detection of novel single-nucleotide polymorphisms (SNPs) in genetic association studies, we propose a method of including prior biological information in a Bayesian shrinkage model that jointly estimates SNP effects. We assume that the SNP effects follow a normal distribution centered at zero with variance controlled by a shrinkage hyperparameter. We use biological information to define the amount of shrinkage applied on the SNP effects distribution, so that the effects of SNPs with more biological support are less shrunk toward zero, thus being more likely detected. The performance of the method was tested in a simulation study (1,000 datasets, 500 subjects with ∼200 SNPs in 10 linkage disequilibrium (LD) blocks) using a continuous and a binary outcome. It was further tested in an empirical example on body mass index (continuous) and overweight (binary) in a dataset of 1,829 subjects and 2,614 SNPs from 30 blocks. Biological knowledge was retrieved using the bioinformatics tool Dintor, which queried various databases. The joint Bayesian model with inclusion of prior information outperformed the standard analysis: in the simulation study, the mean ranking of the true LD block was 2.8 for the Bayesian model versus 3.6 for the standard analysis of individual SNPs; in the empirical example, the mean ranking of the six true blocks was 8.5 versus 9.3 in the standard analysis. These results suggest that our method is more powerful than the standard analysis. We expect its performance to improve further as more biological information about SNPs becomes available.

摘要

为了提高基因关联研究中新型单核苷酸多态性（SNP）的检测能力，我们提出了一种在贝叶斯收缩模型中纳入先验生物学信息的方法，该模型可联合估计SNP效应。我们假设SNP效应服从以零为中心的正态分布，其方差由收缩超参数控制。我们利用生物学信息来定义应用于SNP效应分布的收缩量，这样，获得更多生物学支持的SNP效应向零收缩的程度较小，因此更有可能被检测到。在一项模拟研究（1000个数据集，500名受试者，10个连锁不平衡（LD）区域中有约200个SNP）中，使用连续型和二分类结局对该方法的性能进行了测试。在一个包含1829名受试者和来自30个区域的2614个SNP的数据集上，以体重指数（连续型）和超重（二分类）为例进行了实证检验。使用生物信息学工具Dintor检索生物学知识，该工具查询了各种数据库。纳入先验信息的联合贝叶斯模型优于标准分析：在模拟研究中，对于贝叶斯模型，真实LD区域的平均排名为2.8，而对单个SNP进行标准分析时为3.6；在实证检验中，六个真实区域的平均排名在标准分析中为9.3，而在贝叶斯模型中为8.5。这些结果表明，我们的方法比标准分析更具效力。我们预计，随着更多关于SNP的生物学信息可用，其性能将进一步提高。

相似文献

Inclusion of biological knowledge in a Bayesian shrinkage model for joint estimation of SNP effects.将生物学知识纳入用于SNP效应联合估计的贝叶斯收缩模型。

Genet Epidemiol. 2017 May;41(4):320-331. doi: 10.1002/gepi.22038. Epub 2017 Apr 10.

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies.一种结合基于随机森林的技术和通过潜在变量进行连锁不平衡建模的方法，用于进行多基因座全基因组关联研究。

BMC Bioinformatics. 2018 Mar 27;19(1):106. doi: 10.1186/s12859-018-2054-0.

Localizing putative markers in genetic association studies by incorporating linkage disequilibrium into bayesian hierarchical models.通过将连锁不平衡纳入贝叶斯层次模型，在基因关联研究中定位假定标记。

Hum Hered. 2010;70(1):63-73. doi: 10.1159/000313852. Epub 2010 Jun 10.

Bayesian estimates of linkage disequilibrium.连锁不平衡的贝叶斯估计。

BMC Genet. 2007 Jun 25;8:36. doi: 10.1186/1471-2156-8-36.

Where is the causal variant? On the advantage of the family design over the case-control design in genetic association studies.因果变异在哪里？论家族设计在基因关联研究中相对于病例对照设计的优势。

Eur J Hum Genet. 2015 Oct;23(10):1357-63. doi: 10.1038/ejhg.2014.284. Epub 2015 Jan 14.

Mixture SNPs effect on phenotype in genome-wide association studies.全基因组关联研究中混合单核苷酸多态性对表型的影响。

BMC Genomics. 2015 Feb 3;16(1):3. doi: 10.1186/1471-2164-16-3.

Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies.用于多位点全基因组关联研究的迭代确定独立筛选EM-贝叶斯套索算法

PLoS Comput Biol. 2017 Jan 31;13(1):e1005357. doi: 10.1371/journal.pcbi.1005357. eCollection 2017 Jan.

Testing SNPs and sets of SNPs for importance in association studies.检测单核苷酸多态性（SNP）和 SNP 组合在关联研究中的重要性。

Biostatistics. 2011 Jan;12(1):18-32. doi: 10.1093/biostatistics/kxq042. Epub 2010 Jul 2.

Linkage disequilibrium assessment via log-linear modeling of SNP haplotype frequencies.通过单核苷酸多态性（SNP）单倍型频率的对数线性模型进行连锁不平衡评估。

Genet Epidemiol. 2003 Sep;25(2):106-14. doi: 10.1002/gepi.10254.

Contributions of linkage disequilibrium and co-segregation information to the accuracy of genomic prediction.连锁不平衡和共分离信息对基因组预测准确性的贡献。

Genet Sel Evol. 2016 Oct 11;48(1):77. doi: 10.1186/s12711-016-0255-4.

引用本文的文献

Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes.多基因风险评分在乳腺癌及乳腺癌亚型预测中的应用。

Am J Hum Genet. 2019 Jan 3;104(1):21-34. doi: 10.1016/j.ajhg.2018.11.002. Epub 2018 Dec 13.

本文引用的文献

eQuIPS: eQTL Analysis Using Informed Partitioning of SNPs - A Fully Bayesian Approach.eQuIPS：使用单核苷酸多态性信息分区的eQTL分析——一种全贝叶斯方法。

Genet Epidemiol. 2016 May;40(4):273-83. doi: 10.1002/gepi.21961. Epub 2016 Mar 14.

Incorporating Functional Genomic Information in Genetic Association Studies Using an Empirical Bayes Approach.使用经验贝叶斯方法将功能基因组信息纳入基因关联研究。

Genet Epidemiol. 2016 Apr;40(3):176-87. doi: 10.1002/gepi.21956. Epub 2016 Feb 1.

Dintor: functional annotation of genomic and proteomic data.Dintor：基因组和蛋白质组数据的功能注释。

BMC Genomics. 2015 Dec 21;16:1081. doi: 10.1186/s12864-015-2279-5.

Genetic studies of body mass index yield new insights for obesity biology.遗传研究体重指数为肥胖生物学提供了新的见解。

Nature. 2015 Feb 12;518(7538):197-206. doi: 10.1038/nature14177.

The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.小鼠基因组数据库（MGD）：助力小鼠成为人类生物学和疾病研究的模型

Nucleic Acids Res. 2015 Jan;43(Database issue):D726-36. doi: 10.1093/nar/gku967. Epub 2014 Oct 27.

Efficient haplotype block recognition of very long and dense genetic sequences.高效识别非常长且密集的遗传序列的单倍型块。

BMC Bioinformatics. 2014 Jan 14;15:10. doi: 10.1186/1471-2105-15-10.

Pfam: the protein families database.Pfam：蛋白质家族数据库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.

The Reactome pathway knowledgebase.Reactome 通路知识库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D472-7. doi: 10.1093/nar/gkt1102. Epub 2013 Nov 15.

Annotating cancer variants and anti-cancer therapeutics in reactome.在 Reactome 中注释癌症变体和抗癌疗法。

Cancers (Basel). 2012 Nov 8;4(4):1180-211. doi: 10.3390/cancers4041180.

INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES.将生物信息整合到线性模型中：一种选择通路和基因的贝叶斯方法。

Ann Appl Stat. 2011 Sep 1;5(3):1978-2002. doi: 10.1214/11-AOAS463.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验