Xu L, Yuille A L
Dept. of Comput. Sci., Chinese Univ. of Hong Kong, Shatin.
IEEE Trans Neural Netw. 1995;6(1):131-43. doi: 10.1109/72.363442.
This paper applies statistical physics to the problem of robust principal component analysis (PCA). The commonly used PCA learning rules are first related to energy functions. These functions are generalized by adding a binary decision field with a given prior distribution so that outliers in the data are dealt with explicitly in order to make PCA robust. Each of the generalized energy functions is then used to define a Gibbs distribution from which a marginal distribution is obtained by summing over the binary decision field. The marginal distribution defines an effective energy function, from which self-organizing rules have been developed for robust PCA. Under the presence of outliers, both the standard PCA methods and the existing self-organizing PCA rules studied in the literature of neural networks perform quite poorly. By contrast, the robust rules proposed here resist outliers well and perform excellently for fulfilling various PCA-like tasks such as obtaining the first principal component vector, the first k principal component vectors, and directly finding the subspace spanned by the first k vector principal component vectors without solving for each vector individually. Comparative experiments have been made, and the results show that the authors' robust rules improve the performances of the existing PCA algorithms significantly when outliers are present.
本文将统计物理学应用于鲁棒主成分分析(PCA)问题。首先将常用的PCA学习规则与能量函数联系起来。通过添加具有给定先验分布的二元决策场来推广这些函数,以便明确处理数据中的异常值,从而使PCA具有鲁棒性。然后使用每个广义能量函数来定义一个吉布斯分布,通过对二元决策场求和从中获得边缘分布。边缘分布定义了一个有效能量函数,从中开发了用于鲁棒PCA的自组织规则。在存在异常值的情况下,标准PCA方法和神经网络文献中研究的现有自组织PCA规则表现都相当差。相比之下,这里提出的鲁棒规则能很好地抵抗异常值,并且在完成各种类似PCA的任务时表现出色,比如获得第一主成分向量、前k个主成分向量,以及直接找到由前k个向量主成分向量所张成的子空间,而无需逐个求解每个向量。已经进行了对比实验,结果表明当存在异常值时,作者提出的鲁棒规则显著提高了现有PCA算法的性能。