IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1525-1538. doi: 10.1109/TNNLS.2017.2671849. Epub 2017 Mar 14.
High-dimensional data present in the real world is often corrupted by noise and gross outliers. Principal component analysis (PCA) fails to learn the true low-dimensional subspace in such cases. This is the reason why robust versions of PCA, which put a penalty on arbitrarily large outlying entries, are preferred to perform dimension reduction. In this paper, we argue that it is necessary to study the presence of outliers not only in the observed data matrix but also in the orthogonal complement subspace of the authentic principal subspace. In fact, the latter can seriously skew the estimation of the principal components. A reinforced robustification of principal component pursuit is designed in order to cater to the problem of finding out both types of outliers and eliminate their influence on the final subspace estimation. Simulation results under different design situations clearly show the superiority of our proposed method as compared with other popular implementations of robust PCA. This paper also showcases possible applications of our method in critically tough scenarios of face recognition and video background subtraction. Along with approximating a usable low-dimensional subspace from real-world data sets, the technique can capture semantically meaningful outliers.
在现实世界中,高维数据经常受到噪声和严重异常值的干扰。在这种情况下,主成分分析 (PCA) 无法学习真实的低维子空间。这就是为什么更倾向于使用对任意大的异常值进行惩罚的稳健 PCA 版本来执行降维的原因。在本文中,我们认为有必要不仅在观测数据矩阵中,而且在真实主子空间的正交补子空间中研究异常值的存在。实际上,后者会严重扭曲主成分的估计。为了解决同时发现这两种类型的异常值并消除它们对最终子空间估计的影响的问题,设计了一种强化的主成分追踪稳健化方法。在不同设计情况下的模拟结果清楚地表明,与其他流行的稳健 PCA 实现相比,我们提出的方法具有优越性。本文还展示了我们的方法在人脸识别和视频背景减除等非常困难的场景中的可能应用。该技术可以从实际数据集逼近可用的低维子空间,同时还可以捕获语义上有意义的异常值。