用于全基因组关联研究的贝叶斯图形模型。

Bayesian graphical models for genomewide association studies.

作者信息

Verzilli Claudio J, Stallard Nigel, Whittaker John C

机构信息

Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, UK.

出版信息

Am J Hum Genet. 2006 Jul;79(1):100-12. doi: 10.1086/505313. Epub 2006 May 30.

DOI:10.1086/505313

PMID:16773569

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1474122/

Abstract

As the extent of human genetic variation becomes more fully characterized, the research community is faced with the challenging task of using this information to dissect the heritable components of complex traits. Genomewide association studies offer great promise in this respect, but their analysis poses formidable difficulties. In this article, we describe a computationally efficient approach to mining genotype-phenotype associations that scales to the size of the data sets currently being collected in such studies. We use discrete graphical models as a data-mining tool, searching for single- or multilocus patterns of association around a causative site. The approach is fully Bayesian, allowing us to incorporate prior knowledge on the spatial dependencies around each marker due to linkage disequilibrium, which reduces considerably the number of possible graphical structures. A Markov chain-Monte Carlo scheme is developed that yields samples from the posterior distribution of graphs conditional on the data from which probabilistic statements about the strength of any genotype-phenotype association can be made. Using data simulated under scenarios that vary in marker density, genotype relative risk of a causative allele, and mode of inheritance, we show that the proposed approach has better localization properties and leads to lower false-positive rates than do single-locus analyses. Finally, we present an application of our method to a quasi-synthetic data set in which data from the CYP2D6 region are embedded within simulated data on 100K single-nucleotide polymorphisms. Analysis is quick (<5 min), and we are able to localize the causative site to a very short interval.

摘要

随着人类遗传变异程度得到更全面的表征，研究界面临着一项具有挑战性的任务，即利用这些信息剖析复杂性状的遗传成分。全基因组关联研究在这方面展现出巨大潜力，但其分析也带来了巨大困难。在本文中，我们描述了一种计算效率高的方法来挖掘基因型与表型之间的关联，该方法能够适应此类研究中当前正在收集的数据集的规模。我们使用离散图形模型作为数据挖掘工具，在致病位点周围搜索单基因座或多基因座的关联模式。该方法是完全贝叶斯的，使我们能够纳入由于连锁不平衡而在每个标记周围的空间依赖性的先验知识，这大大减少了可能的图形结构数量。我们开发了一种马尔可夫链蒙特卡罗方案，该方案根据数据生成图形后验分布的样本，据此可以对任何基因型与表型关联的强度做出概率陈述。使用在标记密度、致病等位基因的基因型相对风险和遗传模式各不相同的情况下模拟的数据，我们表明，与单基因座分析相比，所提出的方法具有更好的定位特性，并且导致的假阳性率更低。最后，我们将我们的方法应用于一个准合成数据集，其中来自CYP2D6区域的数据嵌入在关于10万个单核苷酸多态性的模拟数据中。分析速度很快（<5分钟），并且我们能够将致病位点定位到非常短的区间内。

相似文献

Bayesian graphical models for genomewide association studies.用于全基因组关联研究的贝叶斯图形模型。

Am J Hum Genet. 2006 Jul;79(1):100-12. doi: 10.1086/505313. Epub 2006 May 30.

Genetic association mapping via evolution-based clustering of haplotypes.通过基于进化的单倍型聚类进行基因关联图谱分析。

PLoS Genet. 2007 Jul;3(7):e111. doi: 10.1371/journal.pgen.0030111.

Direct analysis of unphased SNP genotype data in population-based association studies via Bayesian partition modelling of haplotypes.在基于人群的关联研究中，通过单倍型的贝叶斯分区建模对未分型的单核苷酸多态性（SNP）基因型数据进行直接分析。

Genet Epidemiol. 2005 Sep;29(2):91-107. doi: 10.1002/gepi.20080.

A Bayesian Markov chain Monte Carlo approach to map disease genes in simulated GAW11 data.一种用于在模拟的GAW11数据中定位疾病基因的贝叶斯马尔可夫链蒙特卡罗方法。

Genet Epidemiol. 1999;17 Suppl 1:S743-8. doi: 10.1002/gepi.13701707122.

A Bayesian multilocus association method: allowing for higher-order interaction in association studies.一种贝叶斯多位点关联方法：在关联研究中考虑高阶相互作用。

Genetics. 2007 Jun;176(2):1197-208. doi: 10.1534/genetics.107.071696. Epub 2007 Apr 15.

Bayesian semiparametric meta-analysis for genetic association studies.贝叶斯半参数荟萃分析在遗传关联研究中的应用。

Genet Epidemiol. 2011 Jul;35(5):333-40. doi: 10.1002/gepi.20581. Epub 2011 Mar 11.

Coalescent-based association mapping and fine mapping of complex trait loci.基于溯祖理论的复杂性状基因座关联定位与精细定位

Genetics. 2005 Feb;169(2):1071-92. doi: 10.1534/genetics.104.031799. Epub 2004 Oct 16.

Measuring gametic disequilibrium from multilocus data.从多位点数据测量配子不平衡。

Genetics. 2001 Jan;157(1):413-23. doi: 10.1093/genetics/157.1.413.

Incorporating single-locus tests into haplotype cladistic analysis in case-control studies.在病例对照研究中，将单基因座检验纳入单倍型分支分析。

PLoS Genet. 2007 Mar 23;3(3):e46. doi: 10.1371/journal.pgen.0030046.

A Bayesian model for detection of high-order interactions among genetic variants in genome-wide association studies.一种用于在全基因组关联研究中检测基因变异间高阶相互作用的贝叶斯模型。

BMC Genomics. 2015 Nov 25;16:1011. doi: 10.1186/s12864-015-2217-6.

引用本文的文献

A BAYESIAN GRAPHICAL MODEL FOR GENOME-WIDE ASSOCIATION STUDIES (GWAS).一种用于全基因组关联研究（GWAS）的贝叶斯图形模型。

Ann Appl Stat. 2016 Jun;10(2):786-811. doi: 10.1214/16-aoas909. Epub 2016 Jul 22.

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies.一种结合基于随机森林的技术和通过潜在变量进行连锁不平衡建模的方法，用于进行多基因座全基因组关联研究。

BMC Bioinformatics. 2018 Mar 27;19(1):106. doi: 10.1186/s12859-018-2054-0.

Evaluation of a two-stage framework for prediction using big genomic data.使用大型基因组数据评估用于预测的两阶段框架。

Brief Bioinform. 2015 Nov;16(6):912-21. doi: 10.1093/bib/bbv010. Epub 2015 Mar 18.

LEAP: biomarker inference through learning and evaluating association patterns.LEAP：通过学习和评估关联模式进行生物标志物推断。

Genet Epidemiol. 2015 Mar;39(3):173-84. doi: 10.1002/gepi.21889. Epub 2015 Feb 12.

Analysis of gene-gene interactions using gene-trait similarity regression.使用基因-性状相似性回归分析基因-基因相互作用。

Hum Hered. 2014;78(1):17-26. doi: 10.1159/000360161. Epub 2014 Jun 21.

CGBayesNets: conditional Gaussian Bayesian network learning and inference with mixed discrete and continuous data.CGBayesNets：混合离散和连续数据条件高斯贝叶斯网络学习与推理。

PLoS Comput Biol. 2014 Jun 12;10(6):e1003676. doi: 10.1371/journal.pcbi.1003676. eCollection 2014 Jun.

A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets.基于高维基因组数据集的临床结局预测方法的比较分析。

J Am Med Inform Assoc. 2014 Oct;21(e2):e312-9. doi: 10.1136/amiajnl-2013-002358. Epub 2014 Apr 15.

Brain galanin system genes interact with life stresses in depression-related phenotypes.脑内甘丙肽系统基因与抑郁相关表型的生活应激相互作用。

Proc Natl Acad Sci U S A. 2014 Apr 22;111(16):E1666-73. doi: 10.1073/pnas.1403649111. Epub 2014 Mar 24.

Mining pure, strict epistatic interactions from high-dimensional datasets: ameliorating the curse of dimensionality.从高维数据集挖掘纯净、严格的上位性相互作用：缓解维度灾难。

PLoS One. 2012;7(10):e46771. doi: 10.1371/journal.pone.0046771. Epub 2012 Oct 12.

Visualization of pairwise and multilocus linkage disequilibrium structure using latent forests.使用潜在森林可视化成对和多位点连锁不平衡结构。

PLoS One. 2011;6(12):e27320. doi: 10.1371/journal.pone.0027320. Epub 2011 Dec 13.

本文引用的文献

A haplotype map of the human genome.人类基因组单倍型图谱。

Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.

High-resolution whole-genome association study of Parkinson disease.帕金森病的高分辨率全基因组关联研究。

Am J Hum Genet. 2005 Nov;77(5):685-93. doi: 10.1086/496902. Epub 2005 Sep 9.

Mapping determinants of human gene expression by regional and genome-wide association.通过区域和全基因组关联研究绘制人类基因表达的决定因素。

Nature. 2005 Oct 27;437(7063):1365-9. doi: 10.1038/nature04244.

Gearing up for genome-wide gene-association studies.为全基因组基因关联研究做好准备。

Hum Mol Genet. 2005 Oct 15;14 Spec No. 2:R157-62. doi: 10.1093/hmg/ddi273.

Prospects and pitfalls in whole genome association studies.全基因组关联研究的前景与陷阱

Philos Trans R Soc Lond B Biol Sci. 2005 Aug 29;360(1460):1589-95. doi: 10.1098/rstb.2005.1689.

Recent developments in genomewide association scans: a workshop summary and review.全基因组关联扫描的最新进展：研讨会总结与综述

Am J Hum Genet. 2005 Sep;77(3):337-45. doi: 10.1086/432962. Epub 2005 Aug 1.

Genet Epidemiol. 2005 Sep;29(2):91-107. doi: 10.1002/gepi.20080.

Mining genetic epidemiology data with Bayesian networks I: Bayesian networks and example application (plasma apoE levels).使用贝叶斯网络挖掘遗传流行病学数据I：贝叶斯网络及示例应用（血浆载脂蛋白E水平）

Bioinformatics. 2005 Aug 1;21(15):3273-8. doi: 10.1093/bioinformatics/bti505. Epub 2005 May 24.

Characterizing allelic associations from unphased diploid data by graphical modeling.通过图形建模从未分型二倍体数据中表征等位基因关联。

Genet Epidemiol. 2005 Jul;29(1):23-35. doi: 10.1002/gepi.20076.

Bayesian modelling of multivariate quantitative traits using seemingly unrelated regressions.使用看似不相关回归的多变量定量性状的贝叶斯建模。

Genet Epidemiol. 2005 May;28(4):313-25. doi: 10.1002/gepi.20072.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验