Precision and Intelligence Laboratory, Tokyo Institute of Technology, 4259 Nagatsuda Midori-ku, 226-8503 Yokohama Japan.
Neural Netw. 2010 Jan;23(1):20-34. doi: 10.1016/j.neunet.2009.08.002. Epub 2009 Aug 15.
Learning machines that have hierarchical structures or hidden variables are singular statistical models because they are nonidentifiable and their Fisher information matrices are singular. In singular statistical models, neither does the Bayes a posteriori distribution converge to the normal distribution nor does the maximum likelihood estimator satisfy asymptotic normality. This is the main reason that it has been difficult to predict their generalization performance from trained states. In this paper, we study four errors, (1) the Bayes generalization error, (2) the Bayes training error, (3) the Gibbs generalization error, and (4) the Gibbs training error, and prove that there are universal mathematical relations among these errors. The formulas proved in this paper are equations of states in statistical estimation because they hold for any true distribution, any parametric model, and any a priori distribution. Also we show that the Bayes and Gibbs generalization errors can be estimated by Bayes and Gibbs training errors, and we propose widely applicable information criteria that can be applied to both regular and singular statistical models.
具有层次结构或隐藏变量的学习机器是奇异统计模型,因为它们是不可识别的,并且它们的 Fisher 信息矩阵是奇异的。在奇异统计模型中,贝叶斯后验分布既不会收敛到正态分布,最大似然估计量也不会满足渐近正态性。这就是从训练状态预测其泛化性能一直很困难的主要原因。在本文中,我们研究了四个误差,(1)贝叶斯泛化误差,(2)贝叶斯训练误差,(3)Gibbs 泛化误差,和(4)Gibbs 训练误差,并证明了这些误差之间存在普遍的数学关系。本文证明的公式是统计估计中的状态方程,因为它们适用于任何真实分布、任何参数模型和任何先验分布。此外,我们还表明可以通过贝叶斯和 Gibbs 训练误差来估计贝叶斯和 Gibbs 泛化误差,并提出了可以应用于正则和奇异统计模型的广泛适用的信息准则。