Faculty of Arts and Sciences, Department of Statistics, Harvard University, Cambridge, MA, USA.
Stat Methods Med Res. 2019 Jul;28(7):1958-1978. doi: 10.1177/0962280217740609. Epub 2017 Nov 29.
Consider a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses; i.e. the analysis produces: (1) consistent point estimates, (2) valid -values, valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (3) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements, the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that hypothetical randomized data set. This multistage effort with thought-provoking tasks involves: (1) a purely that precisely formulate the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; (2) a that approximates a randomized experiment before any outcome data are observed, (3) a comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and (4) a providing conclusions about statistical evidence for the sizes of possible causal effects. Stages 2 and 3 may rely on modern computing to implement the effort, whereas Stage 1 demands careful scientific argumentation to make the embedding plausible to scientific readers of the proffered statistical analysis. Otherwise, the resulting analysis is vulnerable to criticism for being simply a presentation of scientifically meaningless arithmetic calculations. The conceptually most demanding tasks are often the most scientifically interesting to the dedicated researcher and readers of the resulting statistical analyses. This perspective is rarely implemented with any rigor, for example, completely eschewing the first stage. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970s.
考虑一项统计分析,该分析从观察性数据集得出因果推论,这些推论在标准的频率派意义上被认为是有效的;即该分析产生:(1)一致的点估计值,(2)有效的 - 值,即在名义水平或更频繁地拒绝真实零假设的意义上有效,和/或(3)置信区间,其被表示为具有至少其名义覆盖范围的估计量。对于这些陈述的假设有效性,分析必须将观察性研究嵌入创建观察数据的假设随机实验中,或者是该假设随机数据集的一个子集。这项涉及深思熟虑任务的多阶段努力包括:(1)纯粹的思考,即根据暴露于单位的假设随机实验精确地表述因果问题;(2)在观察到任何结果数据之前近似随机实验的努力,(3)在假设随机实验的暴露和非暴露单位中比较感兴趣的结果的努力,以及(4)提供关于统计证据的结论,说明可能的因果效应的大小。第 2 阶段和第 3 阶段可能依赖于现代计算来实现该努力,而第 1 阶段则需要仔细的科学论证来使嵌入式分析对所提供的统计分析的科学读者具有合理性。否则,该分析很容易受到批评,因为它只是呈现了无意义的科学计算。对于专注的研究人员和分析结果的读者来说,概念上最具挑战性的任务通常是最具科学趣味性的。这种观点很少被严格执行,例如,完全回避第一阶段。我们使用一个示例来说明我们的方法,该示例检查了 20 世纪 70 年代在东波士顿居住的家庭中父母吸烟对儿童肺功能的影响。