Department of Mathematical and Computing Science, Tokyo Institute of Technology, 2-12-1 Oookayama, Meguro-ku, Tokyo 52-8552, Japan.
Philos Trans A Math Phys Eng Sci. 2023 May 15;381(2247):20220151. doi: 10.1098/rsta.2022.0151. Epub 2023 Mar 27.
In statistical inference, uncertainty is unknown and all models are wrong. That is to say, a person who makes a statistical model and a prior distribution is simultaneously aware that both are fictional candidates. To study such cases, statistical measures have been constructed, such as cross validation, information criteria and marginal likelihood; however, their mathematical properties have not yet been completely clarified when statistical models are under- or over-parametrized. We introduce a place of mathematical theory of Bayesian statistics for unknown uncertainty, which clarifies general properties of cross validation, information criteria and marginal likelihood, even if an unknown data-generating process is unrealizable by a model or even if the posterior distribution cannot be approximated by any normal distribution. Hence it gives a helpful standpoint for a person who cannot believe in any specific model and prior. This paper consists of three parts. The first is a new result, whereas the second and third are well-known previous results with new experiments. We show there exists a more precise estimator of the generalization loss than leave-one-out cross validation, there exists a more accurate approximation of marginal likelihood than Bayesian information criterion, and the optimal hyperparameters for generalization loss and marginal likelihood are different. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.
在统计推断中,不确定性是未知的,所有的模型都是错误的。也就是说,制作统计模型和先验分布的人同时意识到两者都是虚构的候选者。为了研究这种情况,已经构建了统计度量,例如交叉验证、信息准则和边际似然;然而,当统计模型欠参数化或过参数化时,它们的数学性质尚未完全澄清。我们为未知的不确定性引入了贝叶斯统计数学理论的一个位置,它澄清了交叉验证、信息准则和边际似然的一般性质,即使未知的数据生成过程无法通过模型实现,甚至后验分布也无法被任何正态分布近似。因此,它为那些不能相信任何特定模型和先验的人提供了一个有用的观点。本文由三部分组成。第一部分是一个新的结果,第二部分和第三部分是带有新实验的先前的著名结果。我们表明,存在比留一交叉验证更精确的泛化损失估计器,存在比贝叶斯信息准则更准确的边际似然逼近,以及泛化损失和边际似然的最优超参数是不同的。本文是主题为“贝叶斯推理:挑战、观点和前景”的一部分。