Lunn David J, Whittaker John C, Best Nicky
Department of Epidemiology and Public Health, Imperial College London, St. Mary's Campus, London, UK.
Genet Epidemiol. 2006 Apr;30(3):231-47. doi: 10.1002/gepi.20140.
We present a range of modelling components designed to facilitate Bayesian analysis of genetic-association-study data. A key feature of our approach is the ability to combine different submodels together, almost arbitrarily, for dealing with the complexities of real data. In particular, we propose various techniques for selecting the "best" subset of genetic predictors for a specific phenotype (or set of phenotypes). At the same time, we may control for complex, non-linear relationships between phenotypes and additional (non-genetic) covariates as well as accounting for any residual correlation that exists among multiple phenotypes. Both of these additional modelling components are shown to potentially aid in detecting the underlying genetic signal. We may also account for uncertainty regarding missing genotype data. Indeed, at the heart of our approach is a novel method for reconstructing unobserved haplotypes and/or inferring the values of missing genotypes. This can be deployed independently or, alternatively, it can be fully integrated into arbitrary genotype- or haplotype-based association models such that the missing data and the association model are "estimated" simultaneously. The impact of such simultaneous analysis on inferences drawn from the association model is shown to be potentially significant. Our modelling components are packaged as an "add-on" interface to the widely used WinBUGS software, which allows Markov chain Monte Carlo analysis of a wide range of statistical models. We illustrate their use with a series of increasingly complex analyses conducted on simulated data based on a real pharmacogenetic example.
我们展示了一系列建模组件,旨在促进对基因关联研究数据的贝叶斯分析。我们方法的一个关键特征是能够几乎任意地将不同的子模型组合在一起,以应对实际数据的复杂性。特别是,我们提出了各种技术来为特定表型(或一组表型)选择基因预测因子的“最佳”子集。同时,我们可以控制表型与其他(非基因)协变量之间复杂的非线性关系,并考虑多个表型之间存在的任何残余相关性。这两个额外的建模组件都显示出可能有助于检测潜在的遗传信号。我们还可以考虑缺失基因型数据的不确定性。事实上,我们方法的核心是一种重建未观察到的单倍型和/或推断缺失基因型值的新方法。这可以独立部署,或者,也可以完全集成到基于任意基因型或单倍型的关联模型中,以便同时“估计”缺失数据和关联模型。这种同时分析对从关联模型得出的推断的影响显示可能是显著的。我们的建模组件被打包为一个广泛使用的WinBUGS软件的“附加”接口,该接口允许对各种统计模型进行马尔可夫链蒙特卡罗分析。我们通过基于一个真实药物遗传学实例的模拟数据进行的一系列越来越复杂的分析来说明它们的用法。