Li Chunlin, Shen Xiaotong, Pan Wei
School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA.
Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA.
J Mach Learn Res. 2023 Jan-Dec;24.
Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires to identify the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables. Moreover, we prove that the peeling algorithm yields a consistent estimator in low-order polynomial time. Second, we propose a likelihood ratio test integrated with a data perturbation scheme to account for the uncertainty of identifying ancestors and interventions. Also, we show that the distribution of a data perturbation test statistic converges to the target distribution. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks. The R implementation is available at https://github.com/chunlinli/intdag.
在一些未明确指定干预措施的情况下(即干预目标未知),对有向关系进行统计推断具有挑战性。在本文中,我们用未明确指定的干预措施来检验假设的有向关系。首先,我们推导产生可识别模型的条件。与经典推断不同,检验有向关系需要识别特定假设的主要变量的祖先和相关干预措施。为此,我们提出一种基于节点回归的剥离算法来建立主要变量的拓扑顺序。此外,我们证明剥离算法在低阶多项式时间内产生一致估计量。其次,我们提出一种与数据扰动方案相结合的似然比检验,以考虑识别祖先和干预措施的不确定性。此外,我们表明数据扰动检验统计量的分布收敛到目标分布。数值例子证明了所提出方法的实用性和有效性,包括在推断基因调控网络中的应用。R语言实现可在https://github.com/chunlinli/intdag获取。