IEEE Trans Pattern Anal Mach Intell. 2017 Jan;39(1):47-60. doi: 10.1109/TPAMI.2016.2539946. Epub 2016 Mar 9.
This paper studies the problem of recovering the authentic samples that lie on a union of multiple subspaces from their corrupted observations. Due to the high-dimensional and massive nature of today's data-driven community, it is arguable that the target matrix (i.e., authentic sample matrix) to recover is often low-rank. In this case, the recently established Robust Principal Component Analysis (RPCA) method already provides us a convenient way to solve the problem of recovering mixture data. However, in general, RPCA is not good enough because the incoherent condition assumed by RPCA is not so consistent with the mixture structure of multiple subspaces. Namely, when the subspace number grows, the row-coherence of data keeps heightening and, accordingly, RPCA degrades. To overcome the challenges arising from mixture data, we suggest to consider LRR in this paper. We elucidate that LRR can well handle mixture data, as long as its dictionary is configured appropriately. More precisely, we mathematically prove that LRR can weaken the dependence on the row-coherence, provided that the dictionary is well-conditioned and has a rank of not too high. In particular, if the dictionary itself is sufficiently low-rank, then the dependence on the row-coherence can be completely removed. These provide some elementary principles for dictionary learning and naturally lead to a practical algorithm for recovering mixture data. Our experiments on randomly generated matrices and real motion sequences show promising results.
本文研究了从其受损观测中恢复位于多个子空间并集上的真实样本的问题。由于当今数据驱动型社区的数据具有高维性和海量性,因此可以说要恢复的目标矩阵(即真实样本矩阵)通常是低秩的。在这种情况下,最近提出的稳健主成分分析(RPCA)方法已经为我们提供了一种解决混合数据恢复问题的便捷方法。然而,一般来说,RPCA 并不够好,因为 RPCA 所假设的不相关条件与多个子空间的混合结构并不完全一致。即,当子空间数量增加时,数据的行相干性不断提高,相应地,RPCA 会降级。为了克服混合数据带来的挑战,我们建议在本文中考虑 LRR。我们阐明 LRR 可以很好地处理混合数据,只要其字典配置得当。更准确地说,我们从数学上证明,只要字典具有良好的条件并且秩不太高,LRR 就可以削弱对行相干性的依赖。特别地,如果字典本身足够低秩,则可以完全消除对行相干性的依赖。这些为字典学习提供了一些基本原理,并自然导致了一种用于恢复混合数据的实用算法。我们在随机生成的矩阵和真实运动序列上的实验表明了该算法具有良好的效果。