From the Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom.
MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom.
Epidemiology. 2019 May;30(3):350-357. doi: 10.1097/EDE.0000000000000972.
Participants in epidemiologic and genetic studies are rarely true random samples of the populations they are intended to represent, and both known and unknown factors can influence participation in a study (known as selection into a study). The circumstances in which selection causes bias in an instrumental variable (IV) analysis are not widely understood by practitioners of IV analyses. We use directed acyclic graphs (DAGs) to depict assumptions about the selection mechanism (factors affecting selection) and show how DAGs can be used to determine when a two-stage least squares IV analysis is biased by different selection mechanisms. Through simulations, we show that selection can result in a biased IV estimate with substantial confidence interval (CI) undercoverage, and the level of bias can differ between instrument strengths, a linear and nonlinear exposure-instrument association, and a causal and noncausal exposure effect. We present an application from the UK Biobank study, which is known to be a selected sample of the general population. Of interest was the causal effect of staying in school at least 1 extra year on the decision to smoke. Based on 22,138 participants, the two-stage least squares exposure estimates were very different between the IV analysis ignoring selection and the IV analysis which adjusted for selection (e.g., risk differences, 1.8% [95% CI, -1.5%, 5.0%] and -4.5% [95% CI, -6.6%, -2.4%], respectively). We conclude that selection bias can have a major effect on an IV analysis, and further research is needed on how to conduct sensitivity analyses when selection depends on unmeasured data.
参与流行病学和遗传学研究的人很少是他们所代表的人群的真正随机样本,并且已知和未知的因素都可能影响研究的参与(称为选择进入研究)。实践者对工具变量(IV)分析中选择如何导致偏差的情况了解甚少。我们使用有向无环图(DAG)来描述选择机制的假设(影响选择的因素),并展示如何使用 DAG 来确定在不同选择机制下两阶段最小二乘法 IV 分析是否存在偏差。通过模拟,我们表明选择会导致 IV 估计值出现偏差,置信区间(CI)严重不足,并且在仪器强度、线性和非线性暴露-仪器关联以及因果和非因果暴露效应方面,偏差程度可能会有所不同。我们提出了一个来自英国生物库研究的应用案例,该研究已知是一般人群的选择样本。感兴趣的是在学校至少多上一年对吸烟决定的因果影响。基于 22138 名参与者,在忽略选择的 IV 分析和调整选择的 IV 分析(例如,风险差异,1.8%[95%CI,-1.5%,5.0%]和-4.5%[95%CI,-6.6%,-2.4%])中,两阶段最小二乘法暴露估计值非常不同。我们得出结论,选择偏差会对 IV 分析产生重大影响,并且需要进一步研究如何在选择取决于未测量数据的情况下进行敏感性分析。