Department of Mathematics, University of Siegen, Walter-Flex-Str. 3, Siegen, Germany.
BMC Med Res Methodol. 2022 Feb 27;22(1):58. doi: 10.1186/s12874-021-01473-w.
Causal inference has seen an increasing popularity in medical research. Estimation of causal effects from observational data allows to draw conclusions from data when randomized controlled trials cannot be conducted. Although the identification of structural causal models (SCM) and the calculation of structural coefficients has received much attention, a key requirement for valid causal inference is that conclusions are drawn based on the true data-generating model.
It remains widely unknown how large the probability is to reject the true structural causal model when observational data from it is sampled. The latter probability - the causal false-positive risk - is crucial, as rejection of the true causal model can induce bias in the estimation of causal effects. In this paper, the widely used causal models of confounders and colliders are studied regarding their causal false-positive risk in linear Markovian models. A simulation study is carried out which investigates the causal false-positive risk in Gaussian linear Markovian models. Therefore, the testable implications of the DAG corresponding to confounders and colliders are analyzed from a Bayesian perspective. Furthermore, the induced bias in estimating the structural coefficients and causal effects is studied.
Results show that the false-positive risk of rejecting a true SCM of even simple building blocks like confounders and colliders is substantial. Importantly, estimation of average, direct and indirect causal effects can become strongly biased if a true model is rejected. The causal false-positive risk may thus serve as an indicator or proxy for the induced bias.
While the identification of structural coefficients and testable implications of causal models have been studied rigorously in the literature, this paper shows that causal inference also must develop new concepts for controlling the causal false-positive risk. Although a high risk cannot be equated with a substantial bias, it is indicative of the induced bias. The latter fact calls for the development of more advanced risk measures for committing a causal type I error in causal inference.
因果推断在医学研究中越来越受欢迎。从观察性数据中估计因果效应可以在无法进行随机对照试验时从数据中得出结论。尽管结构因果模型 (SCM) 的识别和结构系数的计算受到了广泛关注,但有效的因果推断的一个关键要求是基于真实的数据生成模型得出结论。
从它的观察数据中采样时,拒绝真实结构因果模型的概率有多大,这仍然是一个广泛未知的问题。后者的概率——因果假阳性风险——是至关重要的,因为拒绝真实因果模型可能会导致因果效应估计的偏差。在本文中,研究了广泛使用的混杂因素和共发器的因果模型,以研究它们在线性马尔可夫模型中的因果假阳性风险。进行了一项模拟研究,以调查高斯线性马尔可夫模型中的因果假阳性风险。因此,从贝叶斯的角度分析了混杂因素和共发器对应的 DAG 的可测试含义。此外,还研究了估计结构系数和因果效应的诱导偏差。
结果表明,即使是像混杂因素和共发器这样的简单构建块的真实 SCM 被拒绝的假阳性风险也是相当大的。重要的是,如果一个真实模型被拒绝,估计平均、直接和间接因果效应可能会变得严重偏差。因此,因果假阳性风险可以作为诱导偏差的指标或代理。
虽然结构系数的识别和因果模型的可测试含义在文献中已经得到了严格的研究,但本文表明,因果推断也必须开发新的概念来控制因果假阳性风险。虽然高风险不能等同于实质性偏差,但它表明了诱导偏差。这一事实要求为因果推断中犯因果第一类错误开发更先进的风险度量。