IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1835-1849. doi: 10.1109/TNNLS.2017.2676817. Epub 2017 Apr 11.
Sparse nonnegative matrix factorization (SNMF) aims to factorize a data matrix into two optimized nonnegative sparse factor matrices, which could benefit many tasks, such as document-word co-clustering. However, the traditional SNMF typically assumes the number of latent factors (i.e., dimensionality of the factor matrices) to be fixed. This assumption makes it inflexible in practice. In this paper, we propose a doubly sparse nonparametric NMF framework to mitigate this issue by using dependent Indian buffet processes (dIBP). We apply a correlation function for the generation of two stick weights associated with each column pair of factor matrices while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two factor matrices will be columnwise correlated. Under this framework, two classes of correlation function are proposed: 1) using bivariate Beta distribution and 2) using Copula function. Compared with the single IBP-based NMF, this paper jointly makes two factor matrices nonparametric and sparse, which could be applied to broader scenarios, such as co-clustering. This paper is seen to be much more flexible than Gaussian process-based and hierarchial Beta process-based dIBPs in terms of allowing the two corresponding binary matrix columns to have greater variations in their nonzero entries. Our experiments on synthetic data show the merits of this paper compared with the state-of-the-art models in respect of factorization efficiency, sparsity, and flexibility. Experiments on real-world data sets demonstrate the efficiency of this paper in document-word co-clustering tasks.
稀疏非负矩阵分解(SNMF)旨在将数据矩阵分解为两个优化的非负稀疏因子矩阵,这有助于许多任务,如文档-词协同聚类。然而,传统的 SNMF 通常假设潜在因子的数量(即因子矩阵的维度)是固定的。这种假设在实际应用中缺乏灵活性。在本文中,我们提出了一种双重稀疏非参数 NMF 框架,通过使用相关的印度自助餐过程(dIBP)来缓解这个问题。我们应用相关函数来生成与因子矩阵每列对相关的两个棍状权重,同时仍然保持 IBP 所指定的各自的边缘分布。因此,两个因子矩阵的生成将是列相关的。在这个框架下,我们提出了两类相关函数:1)使用双变量 Beta 分布和 2)使用 Copula 函数。与基于单 IBP 的 NMF 相比,本文联合使两个因子矩阵具有非参数性和稀疏性,这可以应用于更广泛的场景,如协同聚类。与基于高斯过程和层次 Beta 过程的 dIBP 相比,本文在允许两个对应的二元矩阵列在非零项上有更大的变化方面具有更大的灵活性。我们在合成数据上的实验表明,与最先进的模型相比,本文在分解效率、稀疏性和灵活性方面具有优势。在真实数据集上的实验证明了本文在文档-词协同聚类任务中的有效性。