Choi Junsouk, Chapkin Robert S, Ni Yang
Department of Statistics, Korea University.
Department of Nutrition, Texas A&M University.
Ann Appl Stat. 2025 Sep;19(3):1908-1930. doi: 10.1214/25-aoas2042. Epub 2025 Aug 28.
Observational zero-inflated count data arise in a wide range of areas such as genomics. One of the common research questions is to identify causal relationships by learning the structure of a sparse directed acyclic graph (DAG). While structure learning of DAGs has been an active research area, existing methods do not adequately account for excessive zeros and therefore are not suitable for modeling zero-inflated count data. Moreover, it is often interesting to study differences in the causal networks for data collected from two experimental groups (control vs treatment). To explicitly account for zero-inflation and identify differential causal networks, we propose a novel Bayesian differential zero-inflated negative binomial DAG (DAG0) model. We prove that the causal relationships under the proposed DAG0 are fully identifiable from purely observational, cross-sectional data, using a general proof technique that is applicable beyond the proposed model. Bayesian inference based on parallel-tempered Markov chain Monte Carlo is developed to efficiently explore the multi-modal posterior landscape. We demonstrate the utility of the proposed DAG0 by comparing it with state-of-the-art alternative methods through extensive simulations. An application in a single-cell RNA-sequencing dataset generated under two experimental groups finds some interesting results that appear to be consistent with existing knowledge. A user-friendly R package that implements DAG0 is available at https://github.com/junsoukchoi/BayesDAG0.git.
观测性零膨胀计数数据出现在基因组学等广泛领域。常见的研究问题之一是通过学习稀疏有向无环图(DAG)的结构来识别因果关系。虽然DAG的结构学习一直是一个活跃的研究领域,但现有方法没有充分考虑过多的零值,因此不适用于对零膨胀计数数据进行建模。此外,研究从两个实验组(对照组与处理组)收集的数据的因果网络差异通常很有趣。为了明确考虑零膨胀并识别差异因果网络,我们提出了一种新颖的贝叶斯差异零膨胀负二项式DAG(DAG0)模型。我们证明,使用一种适用于所提出模型之外的通用证明技术,可以从纯观测性横断面数据中完全识别所提出的DAG0下的因果关系。基于并行回火马尔可夫链蒙特卡罗的贝叶斯推断被开发出来,以有效地探索多模态后验分布。通过广泛的模拟将所提出的DAG0与最先进的替代方法进行比较,我们展示了其效用。在两个实验组下生成的单细胞RNA测序数据集中的应用发现了一些有趣的结果,这些结果似乎与现有知识一致。可在https://github.com/junsoukchoi/BayesDAG0.git上获得实现DAG0的用户友好型R包。