Liu Han, Roeder Kathryn, Wasserman Larry
Carnegie Mellon University, Pittsburgh, PA 15213.
Adv Neural Inf Process Syst. 2010 Dec 31;24(2):1432-1440.
A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include -fold cross-validation (-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including -CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.
估计高维图形模型时一个具有挑战性的问题是以数据依赖的方式选择正则化参数。标准技术包括K折交叉验证(K-CV)、赤池信息准则(AIC)和贝叶斯信息准则(BIC)。尽管这些方法在低维问题上效果良好,但它们不适用于高维情况。在本文中,我们提出了StARS:一种基于稳定性的新方法,用于在无向图的高维推断中选择正则化参数。该方法有一个清晰的解释:我们使用最少的正则化,使得在随机抽样下,图既稀疏又可复制。这种解释基本上不需要条件。在温和条件下,我们表明StARS在图估计方面部分是稀疏一致的:即,即使图的大小随样本大小发散,所有真实边也将以高概率包含在所选模型中。从经验上看,在合成数据和真实微阵列数据集上,将StARS的性能与包括K-CV、AIC和BIC在内的现有最佳模型选择程序进行了比较。StARS优于所有这些竞争程序。