Kim Sunkyung, Pan Wei, Shen Xiaotong
Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55405, U.S.A.
Biometrics. 2013 Sep;69(3):582-93. doi: 10.1111/biom.12035. Epub 2013 Jul 3.
Penalized regression approaches are attractive in dealing with high-dimensional data such as arising in high-throughput genomic studies. New methods have been introduced to utilize the network structure of predictors, for example, gene networks, to improve parameter estimation and variable selection. All the existing network-based penalized methods are based on an assumption that parameters, for example, regression coefficients, of neighboring nodes in a network are close in magnitude, which however may not hold. Here we propose a novel penalized regression method based on a weaker prior assumption that the parameters of neighboring nodes in a network are likely to be zero (or non-zero) at the same time, regardless of their specific magnitudes. We propose a novel non-convex penalty function to incorporate this prior, and an algorithm based on difference convex programming. We use simulated data and two breast cancer gene expression datasets to demonstrate the advantages of the proposed methods over some existing methods. Our proposed methods can be applied to more general problems for group variable selection.
惩罚回归方法在处理高维数据(如高通量基因组研究中出现的数据)方面具有吸引力。已经引入了新的方法来利用预测变量的网络结构,例如基因网络,以改进参数估计和变量选择。所有现有的基于网络的惩罚方法都基于这样一个假设,即网络中相邻节点的参数(例如回归系数)在大小上相近,但这一假设可能并不成立。在此,我们基于一个较弱的先验假设提出了一种新颖的惩罚回归方法,即网络中相邻节点的参数可能同时为零(或非零),而不管其具体大小如何。我们提出了一种新颖的非凸惩罚函数来纳入这一先验,并提出了一种基于差分凸规划的算法。我们使用模拟数据和两个乳腺癌基因表达数据集来证明所提出的方法相对于一些现有方法的优势。我们提出的方法可应用于更一般的组变量选择问题。