Parr Thomas, Friston Karl J
Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, 12 Queen Square, London, WC1N 3BG, UK.
Biol Cybern. 2019 Dec;113(5-6):495-513. doi: 10.1007/s00422-019-00805-w. Epub 2019 Sep 27.
Active inference is an approach to understanding behaviour that rests upon the idea that the brain uses an internal generative model to predict incoming sensory data. The fit between this model and data may be improved in two ways. The brain could optimise probabilistic beliefs about the variables in the generative model (i.e. perceptual inference). Alternatively, by acting on the world, it could change the sensory data, such that they are more consistent with the model. This implies a common objective function (variational free energy) for action and perception that scores the fit between an internal model and the world. We compare two free energy functionals for active inference in the framework of Markov decision processes. One of these is a functional of beliefs (i.e. probability distributions) about states and policies, but a function of observations, while the second is a functional of beliefs about all three. In the former (expected free energy), prior beliefs about outcomes are not part of the generative model (because they are absorbed into the prior over policies). Conversely, in the second (generalised free energy), priors over outcomes become an explicit component of the generative model. When using the free energy function, which is blind to future observations, we equip the generative model with a prior over policies that ensure preferred (i.e. priors over) outcomes are realised. In other words, if we expect to encounter a particular kind of outcome, this lends plausibility to those policies for which this outcome is a consequence. In addition, this formulation ensures that selected policies minimise uncertainty about future outcomes by minimising the free energy expected in the future. When using the free energy functional-that effectively treats future observations as hidden states-we show that policies are inferred or selected that realise prior preferences by minimising the free energy of future expectations. Interestingly, the form of posterior beliefs about policies (and associated belief updating) turns out to be identical under both formulations, but the quantities used to compute them are not.
主动推理是一种理解行为的方法,其基于大脑使用内部生成模型来预测传入感官数据这一观点。该模型与数据之间的契合度可通过两种方式得到改善。大脑可以优化关于生成模型中变量的概率信念(即感知推理)。或者,通过作用于世界,它可以改变感官数据,使其与模型更一致。这意味着行动和感知具有一个共同的目标函数(变分自由能),用于衡量内部模型与世界之间的契合度。我们在马尔可夫决策过程的框架内比较了两种用于主动推理的自由能泛函。其中一种是关于状态和策略的信念(即概率分布)的泛函,但它是观测值的函数,而另一种是关于所有三者的信念的泛函。在前者(预期自由能)中,关于结果的先验信念不是生成模型的一部分(因为它们被吸收到策略的先验中)。相反,在后者(广义自由能)中,结果的先验成为生成模型的一个明确组成部分。当使用对未来观测值不敏感的自由能函数时,我们为生成模型配备一个策略先验,以确保实现偏好的(即关于……的先验)结果。换句话说,如果我们预期会遇到某种特定类型的结果,这会使那些会导致该结果的策略更具合理性。此外,这种表述确保所选策略通过最小化未来预期的自由能来最小化未来结果的不确定性。当使用有效将未来观测值视为隐藏状态的自由能泛函时,我们表明通过最小化未来预期的自由能来推断或选择实现先验偏好的策略。有趣的是,在两种表述下,关于策略的后验信念形式(以及相关的信念更新)结果是相同的,但用于计算它们的量却不同。