Zhou Fangting, He Kejun, Ni Yang
Department of Biostatistics, Yale University, New Haven, CT 06520, United States.
Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing 100872, China.
Biometrics. 2025 Jul 3;81(3). doi: 10.1093/biomtc/ujaf089.
Directed acyclic graphical models with additive noises are essential in nonlinear causal discovery and have numerous applications in various domains, such as social science and systems biology. Most such models further assume that structural causal functions are additive to ensure causal identifiability and computational feasibility, which may be too restrictive in the presence of causal interactions. Some methods consider general nonlinear causal functions represented by, for example, Gaussian processes and neural networks, to accommodate interactions. However, they are either computationally intensive or lack interpretability. We propose a highly interpretable and computationally feasible approach using trees to incorporate interactions in nonlinear causal discovery, termed tree-based additive noise models. The nature of the tree construction leads to piecewise constant causal functions, making existing causal identifiability results of additive noise models with continuous and smooth causal functions inapplicable. Therefore, we provide new conditions under which the proposed model is identifiable. We develop a recursive algorithm for source node identification and a score-based ordering search algorithm. Through extensive simulations, we demonstrate the utility of the proposed model and algorithms benchmarking against existing additive noise models, especially when there are strong causal interactions. Our method is applied to infer a protein-protein interaction network for breast cancer, where proteins may form protein complexes to perform their functions.
具有加性噪声的有向无环图模型在非线性因果发现中至关重要,并且在社会科学和系统生物学等各个领域有众多应用。大多数此类模型进一步假设结构因果函数是可加的,以确保因果可识别性和计算可行性,而在存在因果相互作用的情况下,这可能过于严格。一些方法考虑用例如高斯过程和神经网络表示的一般非线性因果函数来适应相互作用。然而,它们要么计算量大,要么缺乏可解释性。我们提出一种高度可解释且计算可行的方法,使用树来在非线性因果发现中纳入相互作用,称为基于树的加性噪声模型。树构建的性质导致分段常数因果函数,使得具有连续和平滑因果函数的加性噪声模型的现有因果可识别性结果不适用。因此,我们提供了所提出模型可识别的新条件。我们开发了一种用于源节点识别的递归算法和一种基于分数的排序搜索算法。通过广泛的模拟,我们展示了所提出模型和算法相对于现有加性噪声模型的效用,特别是在存在强因果相互作用的情况下。我们的方法应用于推断乳腺癌的蛋白质 - 蛋白质相互作用网络,其中蛋白质可能形成蛋白质复合物来执行其功能。