Kolar Mladen, Liu Han, Xing Eric P
The University of Chicago Booth School of Business, Chicago, Illinois 60637, USA.
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544, USA.
J Mach Learn Res. 2014 May;15(May):1713-1750.
Undirected graphical models are important in a number of modern applications that involve exploring or exploiting dependency structures underlying the data. For example, they are often used to explore complex systems where connections between entities are not well understood, such as in functional brain networks or genetic networks. Existing methods for estimating structure of undirected graphical models focus on scenarios where each node represents a scalar random variable, such as a binary neural activation state or a continuous mRNA abundance measurement, even though in many real world problems, nodes can represent multivariate variables with much richer meanings, such as whole images, text documents, or multi-view feature vectors. In this paper, we propose a new principled framework for estimating the structure of undirected graphical models from such multivariate (or multi-attribute) nodal data. The structure of a graph is inferred through estimation of non-zero partial canonical correlation between nodes. Under a Gaussian model, this strategy is equivalent to estimating conditional independencies between random vectors represented by the nodes and it generalizes the classical problem of covariance selection (Dempster, 1972). We relate the problem of estimating non-zero partial canonical correlations to maximizing a penalized Gaussian likelihood objective and develop a method that efficiently maximizes this objective. Extensive simulation studies demonstrate the effectiveness of the method under various conditions. We provide illustrative applications to uncovering gene regulatory networks from gene and protein profiles, and uncovering brain connectivity graph from positron emission tomography data. Finally, we provide sufficient conditions under which the true graphical structure can be recovered correctly.
无向图模型在许多现代应用中都很重要,这些应用涉及探索或利用数据背后的依赖结构。例如,它们经常用于探索实体之间的连接尚未完全理解的复杂系统,如功能脑网络或遗传网络。现有的估计无向图模型结构的方法主要集中在每个节点表示标量随机变量的场景,比如二元神经激活状态或连续的mRNA丰度测量,尽管在许多实际问题中,节点可以表示具有更丰富含义的多变量,如完整图像、文本文档或多视图特征向量。在本文中,我们提出了一个新的原则框架,用于从这种多变量(或多属性)节点数据中估计无向图模型的结构。通过估计节点之间非零的偏典型相关性来推断图的结构。在高斯模型下,这种策略等同于估计节点所代表的随机向量之间的条件独立性,并且它推广了协方差选择的经典问题(邓普斯特,1972年)。我们将估计非零偏典型相关性的问题与最大化惩罚高斯似然目标联系起来,并开发了一种有效最大化该目标的方法。广泛的模拟研究证明了该方法在各种条件下的有效性。我们提供了说明性应用,用于从基因和蛋白质谱中揭示基因调控网络,以及从正电子发射断层扫描数据中揭示脑连接图。最后,我们提供了能够正确恢复真实图形结构的充分条件。