Shen Xiaotong, Huang Hsin-Cheng, Pan Wei
School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, U.S.A. ,
Biometrika. 2012 Dec;99(4):899-914. doi: 10.1093/biomet/ass038. Epub 2012 Oct 18.
In this article, we propose a regression method for simultaneous supervised clustering and feature selection over a given undirected graph, where homogeneous groups or clusters are estimated as well as informative predictors, with each predictor corresponding to one node in the graph and a connecting path indicating a priori possible grouping among the corresponding predictors. The method seeks a parsimonious model with high predictive power through identifying and collapsing homogeneous groups of regression coefficients. To address computational challenges, we present an efficient algorithm integrating the augmented Lagrange multipliers, coordinate descent and difference convex methods. We prove that the proposed method not only identifies the true homogeneous groups and informative features consistently but also leads to accurate parameter estimation. A gene network dataset is analysed to demonstrate that the method can make a difference by exploring dependency structures among the genes.
在本文中,我们提出了一种回归方法,用于在给定的无向图上同时进行监督聚类和特征选择,其中估计出同类组或聚类以及信息性预测变量,每个预测变量对应图中的一个节点,连接路径表示相应预测变量之间的先验可能分组。该方法通过识别和合并回归系数的同类组来寻求具有高预测能力的简约模型。为应对计算挑战,我们提出了一种整合增广拉格朗日乘数、坐标下降和差分凸方法的高效算法。我们证明,所提出的方法不仅能一致地识别出真正的同类组和信息性特征,还能实现准确的参数估计。通过分析一个基因网络数据集来证明该方法可以通过探索基因之间的依赖结构产生显著效果。