Shpitser Ilya, Tchetgen Eric Tchetgen
Department of Computer Science, Johns Hopkins University, 3400 N Charles Street, Baltimore, Maryland 21218,
School of Public Health, Harvard University, 677 Huntington Avenue, Kresge Building, Boston, Massachusetts 02115,
Ann Stat. 2016 Dec;44(6):2433-2466. doi: 10.1214/15-AOS1411. Epub 2016 Nov 23.
Identifying causal parameters from observational data is fraught with subtleties due to the issues of selection bias and confounding. In addition, more complex questions of interest, such as effects of treatment on the treated and mediated effects may not always be identified even in data where treatment assignment is known and under investigator control, or may be identified under one causal model but not another. Increasingly complex effects of interest, coupled with a diversity of causal models in use resulted in a fragmented view of identification. This fragmentation makes it unnecessarily difficult to determine if a given parameter is identified (and in what model), and what assumptions must hold for this to be the case. This, in turn, complicates the development of estimation theory and sensitivity analysis procedures. In this paper, we give a unifying view of a large class of causal effects of interest, including novel effects not previously considered, in terms of a hierarchy of interventions, and show that identification theory for this large class reduces to an identification theory of random variables under interventions from this hierarchy. Moreover, we show that one type of intervention in the hierarchy is naturally associated with queries identified under the Finest Fully Randomized Causally Interpretable Structure Tree Graph (FFRCISTG) model of Robins (via the extended g-formula), and another is naturally associated with queries identified under the Non-Parametric Structural Equation Model with Independent Errors (NPSEM-IE) of Pearl, via a more general functional we call the edge g-formula. Our results motivate the study of estimation theory for the edge g-formula, since we show it arises both in mediation analysis, and in settings where treatment assignment has unobserved causes, such as models associated with Pearl's front-door criterion.
由于选择偏倚和混杂问题,从观察性数据中识别因果参数充满了微妙之处。此外,即使在治疗分配已知且由研究者控制的数据中,更复杂的感兴趣问题,如对接受治疗者的治疗效果和中介效应,也可能无法总是被识别,或者可能在一种因果模型下被识别而在另一种模型下无法被识别。越来越复杂的感兴趣效应,再加上使用的因果模型的多样性,导致了识别观点的碎片化。这种碎片化使得确定给定参数是否可识别(以及在何种模型中)以及为此必须满足哪些假设变得不必要地困难。这反过来又使估计理论和敏感性分析程序的发展变得复杂。在本文中,我们根据干预层次结构,对一大类感兴趣的因果效应,包括以前未考虑的新效应,给出了一个统一的观点,并表明这类效应的识别理论可简化为该层次结构干预下随机变量的识别理论。此外,我们表明层次结构中的一种干预类型自然地与在罗宾斯的最精细完全随机因果可解释结构树图(FFRCISTG)模型下识别的查询相关联(通过扩展的g公式),另一种干预类型则通过我们称为边g公式的更一般函数,自然地与在珀尔的具有独立误差的非参数结构方程模型(NPSEM-IE)下识别的查询相关联。我们的结果推动了对边g公式估计理论的研究,因为我们表明它既出现在中介分析中,也出现在治疗分配有未观察到的原因的情况下,例如与珀尔前门准则相关的模型。