Lin Lina, Drton Mathias, Shojaie Ali
Department of Statistics, University of Washington, Seattle, WA 98195, U.S.A.
Department of Biostatistics, University of Washington, Seattle, WA 98195, U.S.A.
Electron J Stat. 2016;10(1):806-854. doi: 10.1214/16-EJS1126. Epub 2016 Apr 6.
Graphical models are widely used to model stochastic dependences among large collections of variables. We introduce a new method of estimating undirected conditional independence graphs based on the score matching loss, introduced by Hyvärinen (2005), and subsequently extended in Hyvärinen (2007). The method we propose applies to settings with continuous observations and allows for computationally efficient treatment of possibly non-Gaussian exponential family models. In the well-explored Gaussian setting, regularized score matching avoids issues of asymmetry that arise when applying the technique of neighborhood selection, and compared to existing methods that directly yield symmetric estimates, the score matching approach has the advantage that the considered loss is quadratic and gives piecewise linear solution paths under ℓ regularization. Under suitable irrepresentability conditions, we show that ℓ-regularized score matching is consistent for graph estimation in sparse high-dimensional settings. Through numerical experiments and an application to RNAseq data, we confirm that regularized score matching achieves state-of-the-art performance in the Gaussian case and provides a valuable tool for computationally efficient estimation in non-Gaussian graphical models.
图模型被广泛用于对大量变量集合之间的随机依赖性进行建模。我们引入了一种基于分数匹配损失来估计无向条件独立图的新方法,该损失由Hyvärinen(2005)提出,随后在Hyvärinen(2007)中得到扩展。我们提出的方法适用于具有连续观测值的情况,并允许对可能的非高斯指数族模型进行计算高效的处理。在研究充分的高斯设定中,正则化分数匹配避免了应用邻域选择技术时出现的不对称问题,并且与直接产生对称估计的现有方法相比,分数匹配方法的优势在于所考虑的损失是二次的,并且在ℓ正则化下给出分段线性解路径。在合适的不可表示性条件下,我们表明ℓ正则化分数匹配在稀疏高维设定中对于图估计是一致的。通过数值实验以及对RNAseq数据的应用,我们证实正则化分数匹配在高斯情况下实现了最优性能,并为非高斯图模型中的计算高效估计提供了一个有价值的工具。