Champion Théophile, Grześ Marek, Bonheme Lisa, Bowman Howard
University of Birmingham, School of Computer Science Birmingham B15 2TT, U.K.
University of Kent, School of Computing Canterbury CT2 7NZ, U.K.
Neural Comput. 2024 Oct 11;36(11):2403-2445. doi: 10.1162/neco_a_01697.
Active inference is a theory of perception, learning, and decision making that can be applied to neuroscience, robotics, psychology, and machine learning. Recently, intensive research has been taking place to scale up this framework using Monte Carlo tree search and deep learning. The goal of this activity is to solve more complicated tasks using deep active inference. First, we review the existing literature and then progressively build a deep active inference agent as follows: we (1) implement a variational autoencoder (VAE), (2) implement a deep hidden Markov model (HMM), and (3) implement a deep critical hidden Markov model (CHMM). For the CHMM, we implemented two versions, one minimizing expected free energy, CHMM[EFE] and one maximizing rewards, CHMM[reward]. Then we experimented with three different action selection strategies: the ε-greedy algorithm as well as softmax and best action selection. According to our experiments, the models able to solve the dSprites environment are the ones that maximize rewards. On further inspection, we found that the CHMM minimizing expected free energy almost always picks the same action, which makes it unable to solve the dSprites environment. In contrast, the CHMM maximizing reward keeps on selecting all the actions, enabling it to successfully solve the task. The only difference between those two CHMMs is the epistemic value, which aims to make the outputs of the transition and encoder networks as close as possible. Thus, the CHMM minimizing expected free energy repeatedly picks a single action and becomes an expert at predicting the future when selecting this action. This effectively makes the KL divergence between the output of the transition and encoder networks small. Additionally, when selecting the action down the average reward is zero, while for all the other actions, the expected reward will be negative. Therefore, if the CHMM has to stick to a single action to keep the KL divergence small, then the action down is the most rewarding. We also show in simulation that the epistemic value used in deep active inference can behave degenerately and in certain circumstances effectively lose, rather than gain, information. As the agent minimizing EFE is not able to explore its environment, the appropriate formulation of the epistemic value in deep active inference remains an open question.
主动推理是一种关于感知、学习和决策的理论,可应用于神经科学、机器人技术、心理学和机器学习领域。最近,人们一直在进行深入研究,以利用蒙特卡洛树搜索和深度学习来扩展这一框架。此项研究活动的目标是使用深度主动推理解决更复杂的任务。首先,我们回顾现有文献,然后逐步构建一个深度主动推理智能体,具体如下:我们(1)实现一个变分自编码器(VAE),(2)实现一个深度隐马尔可夫模型(HMM),以及(3)实现一个深度临界隐马尔可夫模型(CHMM)。对于CHMM,我们实现了两个版本,一个最小化期望自由能,即CHMM[EFE],另一个最大化奖励,即CHMM[奖励]。然后我们试验了三种不同的动作选择策略:ε-贪婪算法以及softmax和最佳动作选择。根据我们的实验,能够解决dSprites环境的模型是那些最大化奖励的模型。进一步检查发现,最小化期望自由能的CHMM几乎总是选择相同的动作,这使得它无法解决dSprites环境。相比之下,最大化奖励的CHMM不断选择所有动作,从而能够成功解决任务。这两个CHMM之间的唯一区别在于认知值,其目的是使转移网络和编码器网络的输出尽可能接近。因此,最小化期望自由能的CHMM反复选择单个动作,并在选择此动作时成为预测未来的专家。这有效地使转移网络和编码器网络输出之间的KL散度变小。此外,当选择向下的动作时,平均奖励为零,而对于所有其他动作,期望奖励将为负。因此,如果CHMM必须坚持单个动作以保持KL散度小,那么向下的动作是最有回报的。我们还在模拟中表明,深度主动推理中使用的认知值可能会退化,在某些情况下实际上会丢失而不是获得信息。由于最小化EFE的智能体无法探索其环境,因此在深度主动推理中认知值的适当表述仍然是一个悬而未决的问题。