IEEE Trans Neural Netw Learn Syst. 2020 Jun;31(6):2005-2019. doi: 10.1109/TNNLS.2019.2927636. Epub 2019 Aug 28.
Learning Markov blankets (MBs) plays an important role in many machine learning tasks, such as causal Bayesian network structure learning, feature selection, and domain adaptation. Since variables included in the MB of a target variable of interest have causal relationships with the target, the MB can serve as the basis of learning the global structure of a causal Bayesian network or as a reliable and robust feature set for classification, both within the same domain or across domains. In this article, we study the problem of learning the MB of a target variable from multiple interventional data sets. Data sets attained from interventional experiments contain richer causal information than passively observed data (observational data) for MB discovery. However, almost all existing MB discovery methods are designed for learning MBs from a single observational data set. To learn MBs from multiple interventional data sets, we face two challenges: 1) unknown intervention variables and 2) nonidentical data distributions. To address these challenges, we theoretically analyze: 1) under what conditions we can find the correct MB of a target variable and 2) under what conditions we can identify the causes of the target variable via discovering its MB. Based on the theoretical analysis, we propose a new algorithm for learning MBs from multiple interventional data sets, and we present the conditions/assumptions that assure the correctness of the algorithm. To the best of our knowledge, this article is the first to present the theoretical analyses about the conditions for MB discovery in multiple interventional data sets and the algorithm to find the MBs in relation to the conditions. Using benchmark Bayesian networks and real-world data sets, the experiments have validated the effectiveness and efficiency of the proposed algorithm in this article.
学习马尔可夫毯(MB)在许多机器学习任务中起着重要作用,例如因果贝叶斯网络结构学习、特征选择和领域自适应。由于目标变量的 MB 中包含的变量与目标变量具有因果关系,因此 MB 可以作为学习因果贝叶斯网络全局结构的基础,也可以作为同一领域或跨领域分类的可靠、稳健的特征集。在本文中,我们研究了从多个干预数据集学习目标变量的 MB 的问题。干预实验获得的数据比用于 MB 发现的被动观察数据(观测数据)包含更丰富的因果信息。然而,几乎所有现有的 MB 发现方法都是为从单个观测数据集学习 MB 而设计的。为了从多个干预数据集学习 MB,我们面临两个挑战:1)未知的干预变量和 2)不同的数据分布。为了解决这些挑战,我们从理论上分析了:1)在什么条件下我们可以找到目标变量的正确 MB,以及 2)在什么条件下我们可以通过发现其 MB 来识别目标变量的原因。基于理论分析,我们提出了一种从多个干预数据集学习 MB 的新算法,并提出了保证算法正确性的条件/假设。据我们所知,本文首次提出了在多个干预数据集中进行 MB 发现的条件和与条件相关的 MB 发现算法的理论分析。使用基准贝叶斯网络和真实数据集,实验验证了本文提出的算法的有效性和效率。