Deng Wanlu, Geng Zhi, Li Hongzhe
Department of Statistics and Probability, Peking University, Beijing 100871, PR China. Department of Biostatistics, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.
Ann Appl Stat. 2013;7(3):1249-1835. doi: 10.1214/13-aoas635.
Multivariate time series (MTS) data such as time course gene expression data in genomics are often collected to study the dynamic nature of the systems. These data provide important information about the causal dependency among a set of random variables. In this paper, we introduce a computationally efficient algorithm to learn directed acyclic graphs (DAGs) based on MTS data, focusing on learning the local structure of a given target variable. Our algorithm is based on learning all parents (P), all children (C) and some descendants (D) (PCD) iteratively, utilizing the time order of the variables to orient the edges. This time series PCD-PCD algorithm (tsPCD-PCD) extends the previous PCD-PCD algorithm to dependent observations and utilizes composite likelihood ratio tests (CLRTs) for testing the conditional independence. We present the asymptotic distribution of the CLRT statistic and show that the tsPCD-PCD is guaranteed to recover the true DAG structure when the faithfulness condition holds and the tests correctly reject the null hypotheses. Simulation studies show that the CLRTs are valid and perform well even when the sample sizes are small. In addition, the tsPCD-PCD algorithm outperforms the PCD-PCD algorithm in recovering the local graph structures. We illustrate the algorithm by analyzing a time course gene expression data related to mouse T-cell activation.
多变量时间序列(MTS)数据,如基因组学中的时间进程基因表达数据,通常被收集用于研究系统的动态特性。这些数据提供了关于一组随机变量之间因果依赖性的重要信息。在本文中,我们介绍一种计算效率高的算法,用于基于MTS数据学习有向无环图(DAG),重点是学习给定目标变量的局部结构。我们的算法基于迭代学习所有父节点(P)、所有子节点(C)和一些后代节点(D)(PCD),利用变量的时间顺序来确定边的方向。这种时间序列PCD - PCD算法(tsPCD - PCD)将先前的PCD - PCD算法扩展到相关观测,并利用复合似然比检验(CLRT)来检验条件独立性。我们给出了CLRT统计量的渐近分布,并表明当忠实性条件成立且检验正确拒绝原假设时,tsPCD - PCD能够保证恢复真实的DAG结构。模拟研究表明,即使样本量较小,CLRT也是有效的且性能良好。此外,在恢复局部图结构方面,tsPCD - PCD算法优于PCD - PCD算法。我们通过分析与小鼠T细胞激活相关的时间进程基因表达数据来说明该算法。