School of Mathematics, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, PR China; School of Big Data, Fuzhou University of International Studies and Trade, Fuzhou, Fujian, PR China.
School of Mathematics, China University of Mining and Technology, Xuzhou, 221116, Jiangsu, PR China.
Neural Netw. 2024 Nov;179:106628. doi: 10.1016/j.neunet.2024.106628. Epub 2024 Aug 10.
Dictionary learning is an important sparse representation algorithm which has been widely used in machine learning and artificial intelligence. However, for massive data in the big data era, classical dictionary learning algorithms are computationally expensive and even can be infeasible. To overcome this difficulty, we propose new dictionary learning methods based on randomized algorithms. The contributions of this work are as follows. First, we find that dictionary matrix is often numerically low-rank. Based on this property, we apply randomized singular value decomposition (RSVD) to the dictionary matrix, and propose a randomized algorithm for linear dictionary learning. Compared with the classical K-SVD algorithm, an advantage is that one can update all the elements of the dictionary matrix simultaneously. Second, to the best of our knowledge, there are few theoretical results on why one can solve the involved matrix computation problems inexactly in dictionary learning. To fill-in this gap, we show the rationality of this randomized algorithm with inexact solving, from a matrix perturbation analysis point of view. Third, based on the numerically low-rank property and Nyström approximation of the kernel matrix, we propose a randomized kernel dictionary learning algorithm, and establish the distance between the exact solution and the computed solution, to show the effectiveness of the proposed randomized kernel dictionary learning algorithm. Fourth, we propose an efficient scheme for the testing stage in kernel dictionary learning. By using this strategy, there is no need to form nor store kernel matrices explicitly both in the training and the testing stages. Comprehensive numerical experiments are performed on some real-world data sets. Numerical results demonstrate the rationality of our strategies, and show that the proposed algorithms are much efficient than some state-of-the-art dictionary learning algorithms. The MATLAB codes of the proposed algorithms are publicly available from https://github.com/Jiali-yang/RALDL_RAKDL.
字典学习是一种重要的稀疏表示算法,已被广泛应用于机器学习和人工智能领域。然而,对于大数据时代的海量数据,经典的字典学习算法计算量很大,甚至可能不可行。为了克服这一困难,我们提出了基于随机算法的新字典学习方法。这项工作的贡献如下。首先,我们发现字典矩阵通常在数值上是低秩的。基于这一特性,我们将随机奇异值分解(RSVD)应用于字典矩阵,并提出了一种用于线性字典学习的随机算法。与经典的 K-SVD 算法相比,一个优点是可以同时更新字典矩阵的所有元素。其次,据我们所知,关于为什么在字典学习中可以不精确地解决所涉及的矩阵计算问题,几乎没有理论结果。为了填补这一空白,我们从矩阵扰动分析的角度展示了这种随机算法的合理性。第三,基于核矩阵的数值低秩特性和 Nyström 逼近,我们提出了一种随机核字典学习算法,并建立了精确解和计算解之间的距离,以显示所提出的随机核字典学习算法的有效性。第四,我们提出了一种核字典学习中测试阶段的有效方案。通过使用这种策略,在训练和测试阶段都不需要显式地形成和存储核矩阵。在一些真实数据集上进行了全面的数值实验。数值结果证明了我们策略的合理性,并表明所提出的算法比一些最先进的字典学习算法效率更高。所提出算法的 MATLAB 代码可在 https://github.com/Jiali-yang/RALDL_RAKDL 上公开获取。