Mukherjee Chiranjit, Rodriguez Abel
Netflix, Los Gatos, CA (
Department of Applied Mathematics and Statistics, University of California, Santa Cruz, CA 95064 (
J Comput Graph Stat. 2016;25(3):762-788. doi: 10.1080/10618600.2015.1037883. Epub 2016 Aug 5.
Gaussian graphical models are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of Gaussian graphical models extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous sub-groups. In this paper we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable Gaussian graphical models. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the Markov chain Monte Carlo algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which Markov chain Monte Carlo algorithms are too slow to be practically useful.
高斯图形模型在对具有稀疏条件依赖关系的高维多元数据进行建模时很受欢迎。高斯图形模型的混合将该模型扩展到更现实的场景,即观测值来自由少量同质子群体组成的异质总体。在本文中,我们提出了一种新颖的随机搜索算法,用于寻找可分解高斯图形模型的高维狄利克雷过程混合的后验模式。此外,我们研究了如何利用图形处理单元的大规模线程并行化能力来加速计算。我们通过各种模拟数据示例展示了我们算法的计算优势,在中等维度数据示例中,我们将我们的随机搜索与马尔可夫链蒙特卡罗算法进行了比较。这些实验表明,我们的随机搜索在计算时间和发现的后验模式质量方面都大大优于马尔可夫链蒙特卡罗算法。最后,我们分析了一个基因表达数据集,在该数据集中马尔可夫链蒙特卡罗算法太慢而无法实际应用。