IEEE Trans Neural Netw Learn Syst. 2015 Sep;26(9):1913-26. doi: 10.1109/TNNLS.2014.2361052. Epub 2014 Oct 16.
Sparse representations using learned dictionaries are being increasingly used with success in several data processing and machine learning applications. The increasing need for learning sparse models in large-scale applications motivates the development of efficient, robust, and provably good dictionary learning algorithms. Algorithmic stability and generalizability are desirable characteristics for dictionary learning algorithms that aim to build global dictionaries, which can efficiently model any test data similar to the training samples. In this paper, we propose an algorithm to learn dictionaries for sparse representations from large scale data, and prove that the proposed learning algorithm is stable and generalizable asymptotically. The algorithm employs a 1-D subspace clustering procedure, the K-hyperline clustering, to learn a hierarchical dictionary with multiple levels. We also propose an information-theoretic scheme to estimate the number of atoms needed in each level of learning and develop an ensemble approach to learn robust dictionaries. Using the proposed dictionaries, the sparse code for novel test data can be computed using a low-complexity pursuit procedure. We demonstrate the stability and generalization characteristics of the proposed algorithm using simulations. We also evaluate the utility of the multilevel dictionaries in compressed recovery and subspace learning applications.
使用学习到的字典进行稀疏表示,在多个数据处理和机器学习应用中取得了越来越多的成功。在大规模应用中学习稀疏模型的需求不断增加,促使人们开发高效、鲁棒和可证明良好的字典学习算法。算法稳定性和泛化性是旨在构建全局字典的字典学习算法的理想特性,全局字典可以有效地对与训练样本相似的任何测试数据进行建模。在本文中,我们提出了一种从大规模数据中学习稀疏表示字典的算法,并证明了所提出的学习算法在渐近意义上是稳定和可泛化的。该算法采用了 1-D 子空间聚类过程,即 K-超线聚类,来学习具有多个层次的分层字典。我们还提出了一种信息论方案来估计学习中每个层次所需的原子数量,并开发了一种集成方法来学习鲁棒的字典。使用所提出的字典,可以使用低复杂度的追踪过程计算新测试数据的稀疏码。我们使用仿真演示了所提出算法的稳定性和泛化特性。我们还评估了多层次字典在压缩恢复和子空间学习应用中的效用。