Kim Hyunsoo, Park Haesun
College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Bioinformatics. 2007 Jun 15;23(12):1495-502. doi: 10.1093/bioinformatics/btm134. Epub 2007 May 5.
Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space.
In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms.
The software is available as supplementary material.
许多实际的模式识别问题需要非负性约束。例如,数字图像中的像素和生物信息学中的化学浓度都是非负的。当在低维空间中近似高维数据时,需要控制非负基矩阵或非负系数矩阵中的稀疏度时,稀疏非负矩阵分解(NMF)很有用。
在本文中,我们引入了一种新颖的稀疏NMF公式,并展示了新公式如何通过交替非负约束最小二乘法导致收敛的稀疏NMF算法。我们将稀疏NMF算法应用于癌症类别发现和基因表达数据分析,并对所得结果进行生物学分析。我们的实验结果表明,与其他现有NMF算法相比,所提出的稀疏NMF算法通常能在更短的计算时间内实现更好的聚类性能。
该软件作为补充材料提供。