MRC Biostatistics Unit, Cambridge University, UK.
Stat Med. 2010 May 30;29(12):1298-311. doi: 10.1002/sim.3843.
Genetic markers can be used as instrumental variables, in an analogous way to randomization in a clinical trial, to estimate the causal relationship between a phenotype and an outcome variable. Our purpose is to extend the existing methods for such Mendelian randomization studies to the context of multiple genetic markers measured in multiple studies, based on the analysis of individual participant data. First, for a single genetic marker in one study, we show that the usual ratio of coefficients approach can be reformulated as a regression with heterogeneous error in the explanatory variable. This can be implemented using a Bayesian approach, which is next extended to include multiple genetic markers. We then propose a hierarchical model for undertaking a meta-analysis of multiple studies, in which it is not necessary that the same genetic markers are measured in each study. This provides an overall estimate of the causal relationship between the phenotype and the outcome, and an assessment of its heterogeneity across studies. As an example, we estimate the causal relationship of blood concentrations of C-reactive protein on fibrinogen levels using data from 11 studies. These methods provide a flexible framework for efficient estimation of causal relationships derived from multiple studies. Issues discussed include weak instrument bias, analysis of binary outcome data such as disease risk, missing genetic data, and the use of haplotypes.
遗传标记可被用作工具变量,以类似于临床试验中随机化的方式,来估计表型和结果变量之间的因果关系。我们的目的是基于个体参与者数据的分析,将现有的针对这种孟德尔随机化研究的方法扩展到在多个研究中测量多个遗传标记的情况。首先,对于一个研究中的单个遗传标记,我们证明常用的系数比方法可以重新表述为解释变量存在异质误差的回归。这可以使用贝叶斯方法来实现,然后将其扩展到包括多个遗传标记。然后,我们提出了一个用于对多个研究进行荟萃分析的层次模型,其中不必在每个研究中测量相同的遗传标记。这提供了表型和结果之间因果关系的总体估计,并评估了其在研究间的异质性。作为一个例子,我们使用来自 11 项研究的数据估计了 C 反应蛋白在纤维蛋白原水平上的血液浓度对其的因果关系。这些方法为从多个研究中得出的因果关系的有效估计提供了灵活的框架。讨论的问题包括弱工具偏差、疾病风险等二元结果数据的分析、遗传数据缺失以及单倍型的使用。