Watanabe S
Tokyo Institute of Technology, Precision & Intelligence Laboratory, Yokohama, Japan.
Neural Netw. 2001 Oct;14(8):1049-60. doi: 10.1016/s0893-6080(01)00069-7.
Hierarchical learning machines such as layered perceptrons, radial basis functions, Gaussian mixtures are non-identifiable learning machines, whose Fisher information matrices are not positive definite. This fact shows that conventional statistical asymptotic theory cannot be applied to neural network learning theory, for example either the Bayesian a posteriori probability distribution does not converge to the Gaussian distribution, or the generalization error is not in proportion to the number of parameters. The purpose of this paper is to overcome this problem and to clarify the relation between the learning curve of a hierarchical learning machine and the algebraic geometrical structure of the parameter space. We establish an algorithm to calculate the Bayesian stochastic complexity based on blowing-up technology in algebraic geometry and prove that the Bayesian generalization error of a hierarchical learning machine is smaller than that of a regular statistical model, even if the true distribution is not contained in the parametric model.
分层学习机器,如分层感知器、径向基函数、高斯混合模型,是不可识别的学习机器,其费舍尔信息矩阵不是正定的。这一事实表明,传统的统计渐近理论不能应用于神经网络学习理论,例如,贝叶斯后验概率分布不会收敛到高斯分布,或者泛化误差与参数数量不成比例。本文的目的是克服这一问题,并阐明分层学习机器的学习曲线与参数空间的代数几何结构之间的关系。我们基于代数几何中的爆破技术建立了一种计算贝叶斯随机复杂度的算法,并证明即使真实分布不包含在参数模型中,分层学习机器的贝叶斯泛化误差也小于常规统计模型的贝叶斯泛化误差。