Lee Mihee, Shen Haipeng, Huang Jianhua Z, Marron J S
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
Biometrics. 2010 Dec;66(4):1087-95. doi: 10.1111/j.1541-0420.2010.01392.x.
Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row-column associations within high-dimensional data matrices. SSVD seeks a low-rank, checkerboard structured matrix approximation to data matrices. The desired checkerboard structure is achieved by forcing both the left- and right-singular vectors to be sparse, that is, having many zero entries. By interpreting singular vectors as regression coefficient vectors for certain linear regressions, sparsity-inducing regularization penalties are imposed to the least squares regression to produce sparse singular vectors. An efficient iterative algorithm is proposed for computing the sparse singular vectors, along with some discussion of penalty parameter selection. A lung cancer microarray dataset and a food nutrition dataset are used to illustrate SSVD as a biclustering method. SSVD is also compared with some existing biclustering methods using simulated datasets.
稀疏奇异值分解(SSVD)被提议作为一种新的探索性分析工具,用于双聚类或识别高维数据矩阵中可解释的行-列关联。SSVD寻求对数据矩阵的低秩、棋盘结构矩阵近似。通过强制左奇异向量和右奇异向量都稀疏,即具有许多零元素,来实现所需的棋盘结构。通过将奇异向量解释为某些线性回归的回归系数向量,对最小二乘回归施加诱导稀疏性的正则化惩罚,以产生稀疏奇异向量。提出了一种用于计算稀疏奇异向量的高效迭代算法,并对惩罚参数选择进行了一些讨论。使用肺癌微阵列数据集和食物营养数据集来说明SSVD作为一种双聚类方法。还使用模拟数据集将SSVD与一些现有的双聚类方法进行了比较。