Department of Health Studies, University of Chicago, Chicago, IL, USA.
Int J Epidemiol. 2011 Jun;40(3):740-52. doi: 10.1093/ije/dyq151. Epub 2010 Sep 2.
Mendelian Randomization (MR) studies assess the causality of an exposure-disease association using genetic determinants [i.e. instrumental variables (IVs)] of the exposure. Power and IV strength requirements for MR studies using multiple genetic variants have not been explored.
We simulated cohort data sets consisting of a normally distributed disease trait, a normally distributed exposure, which affects this trait and a biallelic genetic variant that affects the exposure. We estimated power to detect an effect of exposure on disease for varying allele frequencies, effect sizes and samples sizes (using two-stage least squares regression on 10,000 data sets-Stage 1 is a regression of exposure on the variant. Stage 2 is a regression of disease on the fitted exposure). Similar analyses were conducted using multiple genetic variants (5, 10, 20) as independent or combined IVs. We assessed IV strength using the first-stage F statistic.
Simulations of realistic scenarios indicate that MR studies will require large (n > 1000), often very large (n > 10,000), sample sizes. In many cases, so-called 'weak IV' problems arise when using multiple variants as independent IVs (even with as few as five), resulting in biased effect estimates. Combining genetic factors into fewer IVs results in modest power decreases, but alleviates weak IV problems. Ideal methods for combining genetic factors depend upon knowledge of the genetic architecture underlying the exposure.
The feasibility of well-powered, unbiased MR studies will depend upon the amount of variance in the exposure that can be explained by known genetic factors and the 'strength' of the IV set derived from these genetic factors.
孟德尔随机化(MR)研究使用暴露的遗传决定因素(即工具变量[IVs])来评估暴露与疾病之间的因果关系。使用多个遗传变异进行 MR 研究的功效和 IV 强度要求尚未得到探索。
我们模拟了由正态分布的疾病特征、正态分布的暴露因素(影响该特征)以及双等位基因遗传变异组成的队列数据集,该变异影响暴露因素。我们使用两阶段最小二乘法回归(在 10000 个数据集上进行第 1 阶段是暴露对变异的回归,第 2 阶段是疾病对拟合暴露的回归),估计了不同等位基因频率、效应大小和样本量下检测暴露对疾病影响的功效(使用两阶段最小二乘法回归)。使用多个遗传变异(5、10、20)作为独立或组合 IV 进行了类似的分析。我们使用第一阶段 F 统计量评估 IV 强度。
现实情况下的模拟表明,MR 研究将需要大量(n>1000),通常是非常大的(n>10000)样本量。在许多情况下,当使用多个变异作为独立 IV 时,会出现所谓的“弱 IV”问题(即使只有 5 个),导致有偏差的效应估计。将遗传因素组合成较少的 IV 会导致功效略有下降,但可以缓解弱 IV 问题。组合遗传因素的理想方法取决于对暴露的遗传结构的了解。
具有足够功效和无偏性的 MR 研究的可行性将取决于已知遗传因素可以解释的暴露变异性的多少,以及从这些遗传因素中得出的 IV 集的“强度”。