Wan Changlin, Dang Pengtao, Zhao Tong, Zang Yong, Zhang Chi, Cao Sha
Indiana University, Indianapolis, Indiana, United States.
Purdue University, West Lafayette, Indiana, United States.
Proc Mach Learn Res. 2022 Aug;180:2035-2044.
Boolean matrix factorization (BMF) is a combinatorial problem arising from a wide range of applications including recommendation system, collaborative filtering, and dimensionality reduction. Currently, the noise model of existing BMF methods is often assumed to be homoscedastic; however, in real world data scenarios, the deviations of observed data from their true values are almost surely diverse due to stochastic noises, making each data point not equally suitable for fitting a model. In this case, it is not ideal to treat all data points as equally distributed. Motivated by such observations, we introduce a probabilistic BMF model that recognizes the object- and feature-wise bias distribution respectively, called bias aware BMF (BABF). To the best of our knowledge, BABF is the first approach for Boolean decomposition with consideration of the feature-wise and object-wise bias in binary data. We conducted experiments on datasets with different levels of background noise, bias level, and sizes of the signal patterns, to test the effectiveness of our method in various scenarios. We demonstrated that our model outperforms the state-of-the-art factorization methods in both accuracy and efficiency in recovering the original datasets, and the inferred bias level is highly significantly correlated with true existing bias in both simulated and real world datasets.
布尔矩阵分解(BMF)是一个组合问题,它源于包括推荐系统、协同过滤和降维在内的广泛应用。目前,现有的BMF方法的噪声模型通常被假定为同方差的;然而,在现实世界的数据场景中,由于随机噪声,观测数据与其真实值之间的偏差几乎肯定是不同的,这使得每个数据点并不都同样适合于拟合模型。在这种情况下,将所有数据点视为均匀分布并不理想。受这些观察结果的启发,我们引入了一种概率BMF模型,该模型分别识别对象和特征方面的偏差分布,称为偏差感知BMF(BABF)。据我们所知,BABF是第一种在二元数据中考虑特征和对象偏差的布尔分解方法。我们在具有不同背景噪声水平、偏差水平和信号模式大小的数据集上进行了实验,以测试我们的方法在各种场景下的有效性。我们证明,在恢复原始数据集方面,我们的模型在准确性和效率上均优于当前最先进的分解方法,并且在模拟和现实世界数据集中,推断出的偏差水平与实际存在的偏差高度显著相关。