Han Lei, Zhang Yu, Zhang Tong
Department of Statistics, Rutgers University.
Department of Computer Science and Engineering, Hong Kong University of Science and Technology.
KDD. 2016 Aug;2016:1585-1594. doi: 10.1145/2939672.2939851.
The maximum likelihood estimation (MLE) for the Gaussian graphical model, which is also known as the inverse covariance estimation problem, has gained increasing interest recently. Most existing works assume that inverse covariance estimators contain sparse structure and then construct models with the regularization. In this paper, different from existing works, we study the inverse covariance estimation problem from another perspective by efficiently modeling the low-rank structure in the inverse covariance, which is assumed to be a combination of a low-rank part and a diagonal matrix. One motivation for this assumption is that the low-rank structure is common in many applications including the climate and financial analysis, and another one is that such assumption can reduce the computational complexity when computing its inverse. Specifically, we propose an efficient COmponent Pursuit (COP) method to obtain the low-rank part, where each component can be sparse. For optimization, the COP method greedily learns a rank-one component in each iteration by maximizing the log-likelihood. Moreover, the COP algorithm enjoys several appealing properties including the existence of an efficient solution in each iteration and the theoretical guarantee on the convergence of this greedy approach. Experiments on large-scale synthetic and real-world datasets including thousands of millions variables show that the COP method is faster than the state-of-the-art techniques for the inverse covariance estimation problem when achieving comparable log-likelihood on test data.
高斯图形模型的最大似然估计(MLE),也被称为逆协方差估计问题,近年来受到了越来越多的关注。大多数现有工作假设逆协方差估计器包含稀疏结构,然后使用正则化来构建模型。在本文中,与现有工作不同,我们从另一个角度研究逆协方差估计问题,通过有效地对逆协方差中的低秩结构进行建模,该低秩结构被假设为一个低秩部分和一个对角矩阵的组合。做出这种假设的一个动机是,低秩结构在包括气候和金融分析在内的许多应用中很常见,另一个动机是这种假设可以降低计算其逆矩阵时的计算复杂度。具体来说,我们提出了一种有效的成分追踪(COP)方法来获得低秩部分,其中每个成分可以是稀疏的。为了进行优化,COP方法在每次迭代中通过最大化对数似然来贪婪地学习一个秩一成分。此外,COP算法具有几个吸引人的特性,包括每次迭代中存在有效解以及这种贪婪方法收敛的理论保证。在包含数千万个变量的大规模合成数据集和真实世界数据集上的实验表明,在测试数据上实现可比对数似然时,COP方法在逆协方差估计问题上比现有技术更快。