Brooks Jp, Dulá Jh, Boone El
Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, VA 23284.
Comput Stat Data Anal. 2013 May 1;61:83-98. doi: 10.1016/j.csda.2012.11.007.
The norm has been applied in numerous variations of principal component analysis (PCA). -norm PCA is an attractive alternative to traditional -based PCA because it can impart robustness in the presence of outliers and is indicated for models where standard Gaussian assumptions about the noise may not apply. Of all the previously-proposed PCA schemes that recast PCA as an optimization problem involving the norm, none provide globally optimal solutions in polynomial time. This paper proposes an -norm PCA procedure based on the efficient calculation of the optimal solution of the -norm best-fit hyperplane problem. We present a procedure called -PCA* based on the application of this idea that fits data to subspaces of successively smaller dimension. The procedure is implemented and tested on a diverse problem suite. Our tests show that -PCA* is the indicated procedure in the presence of unbalanced outlier contamination.
该范数已应用于主成分分析(PCA)的众多变体中。-范数PCA是传统基于-的PCA的一种有吸引力的替代方法,因为它在存在异常值的情况下可以增强稳健性,并且适用于标准高斯噪声假设可能不适用的模型。在所有先前提出的将PCA重新表述为涉及-范数的优化问题的PCA方案中,没有一个能在多项式时间内提供全局最优解。本文基于-范数最佳拟合超平面问题最优解的高效计算,提出了一种-范数PCA方法。我们提出了一种基于此思想应用的过程,称为-PCA*,它将数据拟合到维度逐渐减小的子空间。该过程在各种问题集上进行了实现和测试。我们的测试表明,在存在不平衡异常值污染的情况下,-PCA*是适用的方法。