Hu Xuelei, Xu Lei
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong, China.
Neural Netw. 2004 Oct-Nov;17(8-9):1051-9. doi: 10.1016/j.neunet.2004.07.005.
It is well-known that constrained Hebbian self-organization on multiple linear neural units leads to the same k-dimensional subspace spanned by the first k principal components. Not only the batch PCA algorithm has been widely applied in various fields since 1930s, but also a variety of adaptive algorithms have been proposed in the past two decades. However, most studies assume a known dimension k or determine it heuristically, though there exist a number of model selection criteria in the literature of statistics. Recently, criteria have also been obtained under the framework of Bayesian Ying-Yang (BYY) harmony learning. This paper further investigates the BYY criteria in comparison with existing typical criteria, including Akaike's information criterion (AIC), the consistent Akaike's information criterion (CAIC), the Bayesian inference criterion (BIC), and the cross-validation (CV) criterion. This comparative study is made via experiments not only on simulated data sets of different sample sizes, noise variances, data space dimensions, and subspace dimensions, but also on two real data sets from air pollution problem and sport track records, respectively. Experiments have shown that BIC outperforms AIC, CAIC, and CV while the BYY criteria are either comparable with or better than BIC. Therefore, BYY harmony learning is a more preferred tool for subspace dimension determination by further considering that the appropriate subspace dimension k can be automatically determined during implementing BYY harmony learning for the principal subspace while the selection of subspace dimension k by BIC, AIC, CAIC, and CV has to be made at the second stage based on a set of candidate subspaces with different dimensions which have to be obtained at the first stage of learning.
众所周知,多个线性神经单元上的受限赫布自组织会导致由前(k)个主成分所张成的相同(k)维子空间。自20世纪30年代以来,不仅批处理主成分分析(PCA)算法已在各个领域广泛应用,而且在过去二十年中还提出了各种自适应算法。然而,尽管统计学文献中有许多模型选择标准,但大多数研究都假设维度(k)已知或通过启发式方法确定它。最近,在贝叶斯阴阳(BYY)和谐学习框架下也得到了相关标准。本文进一步将BYY标准与现有的典型标准进行比较研究,包括赤池信息准则(AIC)、一致赤池信息准则(CAIC)、贝叶斯推断准则(BIC)和交叉验证(CV)准则。这种比较研究不仅通过对不同样本大小、噪声方差、数据空间维度和子空间维度的模拟数据集进行实验,还分别对来自空气污染问题和运动轨迹记录的两个真实数据集进行实验。实验表明,BIC优于AIC、CAIC和CV,而BYY标准与BIC相当或更好。因此,BYY和谐学习是用于子空间维度确定的更优工具,进一步考虑到在对主子空间实施BYY和谐学习过程中可以自动确定合适的子空间维度(k),而BIC、AIC、CAIC和CV选择子空间维度(k)必须在学习的第一阶段获得一组具有不同维度的候选子空间的基础上,在第二阶段进行。