Fu Yunhui, Matsushima Shin, Yamanishi Kenji
The Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku 113-8656, Japan.
The Department of General Systems Studies, Graduate School of Arts and Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro-ku 153-8902, Japan.
Entropy (Basel). 2019 Jun 27;21(7):632. doi: 10.3390/e21070632.
Non-negative tensor factorization (NTF) is a widely used multi-way analysis approach that factorizes a high-order non-negative data tensor into several non-negative factor matrices. In NTF, the non-negative rank has to be predetermined to specify the model and it greatly influences the factorized matrices. However, its value is conventionally determined by specialists' insights or trial and error. This paper proposes a novel rank selection criterion for NTF on the basis of the minimum description length (MDL) principle. Our methodology is unique in that (1) we apply the MDL principle on to overcome a problem caused by the imbalance between the number of elements in a data tensor and that in factor matrices, and (2) we employ the (NML) code-length for histogram densities. We employ synthetic and real data to empirically demonstrate that our method outperforms other criteria in terms of accuracies for estimating true ranks and for completing missing values. We further show that our method can produce ranks suitable for knowledge discovery.
非负张量分解(NTF)是一种广泛使用的多向分析方法,它将一个高阶非负数据张量分解为几个非负因子矩阵。在NTF中,必须预先确定非负秩以指定模型,并且它对分解后的矩阵有很大影响。然而,其值通常由专家的见解或反复试验来确定。本文基于最小描述长度(MDL)原则为NTF提出了一种新颖的秩选择标准。我们的方法独特之处在于:(1)我们应用MDL原则来克服数据张量中元素数量与因子矩阵中元素数量之间不平衡所导致的问题;(2)我们采用用于直方图密度的归一化最大似然(NML)码长。我们使用合成数据和真实数据通过实验证明,在估计真实秩和填补缺失值的准确性方面,我们的方法优于其他标准。我们进一步表明,我们的方法可以产生适合知识发现的秩。