Burgess Stephen, Scott Robert A, Timpson Nicholas J, Davey Smith George, Thompson Simon G
Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK,
Eur J Epidemiol. 2015 Jul;30(7):543-52. doi: 10.1007/s10654-015-0011-z. Epub 2015 Mar 15.
Finding individual-level data for adequately-powered Mendelian randomization analyses may be problematic. As publicly-available summarized data on genetic associations with disease outcomes from large consortia are becoming more abundant, use of published data is an attractive analysis strategy for obtaining precise estimates of the causal effects of risk factors on outcomes. We detail the necessary steps for conducting Mendelian randomization investigations using published data, and present novel statistical methods for combining data on the associations of multiple (correlated or uncorrelated) genetic variants with the risk factor and outcome into a single causal effect estimate. A two-sample analysis strategy may be employed, in which evidence on the gene-risk factor and gene-outcome associations are taken from different data sources. These approaches allow the efficient identification of risk factors that are suitable targets for clinical intervention from published data, although the ability to assess the assumptions necessary for causal inference is diminished. Methods and guidance are illustrated using the example of the causal effect of serum calcium levels on fasting glucose concentrations. The estimated causal effect of a 1 standard deviation (0.13 mmol/L) increase in calcium levels on fasting glucose (mM) using a single lead variant from the CASR gene region is 0.044 (95 % credible interval -0.002, 0.100). In contrast, using our method to account for the correlation between variants, the corresponding estimate using 17 genetic variants is 0.022 (95 % credible interval 0.009, 0.035), a more clearly positive causal effect.
寻找用于充分有力的孟德尔随机化分析的个体水平数据可能存在问题。随着来自大型联盟的关于基因与疾病结局关联的公开汇总数据越来越丰富,使用已发表数据是一种有吸引力的分析策略,可用于获得风险因素对结局因果效应的精确估计。我们详细介绍了使用已发表数据进行孟德尔随机化研究的必要步骤,并提出了新的统计方法,用于将多个(相关或不相关)基因变异与风险因素和结局关联的数据合并为单个因果效应估计值。可以采用两样本分析策略,其中关于基因 - 风险因素和基因 - 结局关联的证据取自不同的数据源。这些方法能够从已发表数据中有效识别出适合作为临床干预靶点的风险因素,尽管评估因果推断所需假设的能力有所减弱。通过血清钙水平对空腹血糖浓度因果效应的例子来说明方法和指导。使用来自CASR基因区域的单个主导变异,钙水平每增加1个标准差(0.13 mmol/L)对空腹血糖(mM)的估计因果效应为0.044(95%可信区间 -0.002,0.100)。相比之下,使用我们考虑变异间相关性的方法,使用17个基因变异的相应估计值为0.022(95%可信区间0.009,0.035),这是一个更明显的正向因果效应。