Hogan Joseph W, Lancaster Tony
Center for Statistical Sciences, Department of Community Health, Box G-H, Brown University, Providence, RI 02912, USA.
Stat Methods Med Res. 2004 Feb;13(1):17-48. doi: 10.1191/0962280204sm351ra.
Inferring causal effects from longitudinal repeated measures data has high relevance to a number of areas of research, including economics, social sciences and epidemiology. In observational studies in particular, the treatment receipt mechanism is typically not under the control of the investigator; it can depend on various factors, including the outcome of interest. This results in differential selection into treatment levels, and can lead to selection bias when standard routines such as least squares regression are used to estimate causal effects. Interestingly, both the characterization of and methodology for handling selection bias can differ substantially by disciplinary tradition. In social sciences and economics, instrumental variables (IV) is the standard method for estimating linear and nonlinear models in which the error term may be correlated with an observed covariate. When such correlation is not ruled out, the covariate is called endogenous and least squares estimates of the covariate effect are typically biased. The availability of an instrumental variable can be used to reduce or eliminate the bias. In public health and clinical medicine (e.g., epidemiology and biostatistics), selection bias is typically viewed in terms of confounders, and the prevailing methods are geared toward making proper adjustments via explicit use of observed confounders (e.g., stratification, standardization). A class of methods known as inverse probability weighting (IPW) estimators, which relies on modeling selection in terms of confounders, is gaining in popularity for making such adjustments. Our objective is to review and compare IPW and IV for estimating causal treatment effects from longitudinal data, where the treatment may vary with time. We accomplish this by defining the causal estimands in terms of a linear stochastic model of potential outcomes (counterfactuals). Our comparison includes a review of terminology typically used in discussions of causal inference (e.g., confounding, endogeneity); a review of assumptions required to identify causal effects and their implications for estimation and interpretation; description of estimation via inverse weighting and instrumental variables; and a comparative analysis of data from a longitudinal cohort study of HIV-infected women. In our discussion of assumptions and estimation routines, we try to emphasize sufficient conditions needed to implement relatively standard analyses that can essentially be formulated as regression models. In that sense this review is geared toward the quantitative practitioner. The objective of the data analysis is to estimate the causal (therapeutic) effect of receiving combination antiviral therapy on longitudinal CD4 cell counts, where receipt of therapy varies with time and depends on CD4 count and other covariates. Assumptions are reviewed in context, and resulting inferences are compared. The analysis illustrates the importance of considering the existence of unmeasured confounding and of checking for 'weak instruments.' It also suggests that IV methodology may have a role in longitudinal cohort studies where potential instrumental variables are available.
从纵向重复测量数据中推断因果效应与许多研究领域高度相关,包括经济学、社会科学和流行病学。特别是在观察性研究中,治疗接受机制通常不在研究者的控制之下;它可能取决于各种因素,包括感兴趣的结果。这导致了治疗水平的差异选择,并且当使用诸如最小二乘回归等标准方法来估计因果效应时,可能会导致选择偏差。有趣的是,处理选择偏差的特征和方法在不同学科传统中可能有很大差异。在社会科学和经济学中,工具变量(IV)是估计线性和非线性模型的标准方法,其中误差项可能与观察到的协变量相关。当不排除这种相关性时,该协变量被称为内生变量,协变量效应的最小二乘估计通常存在偏差。工具变量的可用性可用于减少或消除偏差。在公共卫生和临床医学(如流行病学和生物统计学)中,选择偏差通常从混杂因素的角度来看待,并且主要方法是通过明确使用观察到的混杂因素(如分层、标准化)进行适当调整。一类称为逆概率加权(IPW)估计器的方法,它依赖于根据混杂因素对选择进行建模,在进行此类调整方面越来越受欢迎。我们的目标是回顾和比较IPW和IV,以从纵向数据中估计因果治疗效应,其中治疗可能随时间变化。我们通过根据潜在结果(反事实)的线性随机模型定义因果估计量来实现这一目标。我们的比较包括回顾因果推断讨论中通常使用的术语(如混杂、内生性);回顾识别因果效应所需的假设及其对估计和解释的影响;描述通过逆加权和工具变量进行的估计;以及对一组感染HIV的女性的纵向队列研究数据进行比较分析。在我们对假设和估计程序的讨论中,我们试图强调实施相对标准分析所需的充分条件,这些分析基本上可以表述为回归模型。从这个意义上说,这篇综述是针对定量研究者的。数据分析的目的是估计接受联合抗病毒治疗对纵向CD4细胞计数的因果(治疗)效应,其中治疗的接受随时间变化,并取决于CD4计数和其他协变量。在具体情境中回顾假设,并比较所得的推断。该分析说明了考虑未测量混杂因素的存在和检查“弱工具变量”的重要性。它还表明,IV方法在有潜在工具变量可用的纵向队列研究中可能有用。