École Polytechnique Fédérale de Lausanne, School of Computer and Communication Sciences and School of Life Sciences, 1015 Lausanne, Switzerland
Neural Comput. 2021 Feb;33(2):269-340. doi: 10.1162/neco_a_01352. Epub 2021 Jan 5.
Surprise-based learning allows agents to rapidly adapt to nonstationary stochastic environments characterized by sudden changes. We show that exact Bayesian inference in a hierarchical model gives rise to a surprise-modulated trade-off between forgetting old observations and integrating them with the new ones. The modulation depends on a probability ratio, which we call the Bayes Factor Surprise, that tests the prior belief against the current belief. We demonstrate that in several existing approximate algorithms, the Bayes Factor Surprise modulates the rate of adaptation to new observations. We derive three novel surprise-based algorithms, one in the family of particle filters, one in the family of variational learning, and one in the family of message passing, that have constant scaling in observation sequence length and particularly simple update dynamics for any distribution in the exponential family. Empirical results show that these surprise-based algorithms estimate parameters better than alternative approximate approaches and reach levels of performance comparable to computationally more expensive algorithms. The Bayes Factor Surprise is related to but different from the Shannon Surprise. In two hypothetical experiments, we make testable predictions for physiological indicators that dissociate the Bayes Factor Surprise from the Shannon Surprise. The theoretical insight of casting various approaches as surprise-based learning, as well as the proposed online algorithms, may be applied to the analysis of animal and human behavior and to reinforcement learning in nonstationary environments.
基于惊讶的学习允许代理快速适应具有突然变化的非平稳随机环境。我们表明,在分层模型中的精确贝叶斯推断会导致旧观测值的遗忘与新观测值的整合之间的惊讶调节权衡。这种调制取决于一个概率比,我们称之为贝叶斯因子惊讶,它测试先验信念与当前信念的一致性。我们证明,在几个现有的近似算法中,贝叶斯因子惊讶会调节对新观测值的适应速度。我们推导出三种新的基于惊讶的算法,一种是在粒子滤波器家族中,一种是在变分学习家族中,一种是在消息传递家族中,它们在观测序列长度上具有常数缩放,并且对于指数族中的任何分布,更新动态都特别简单。实证结果表明,这些基于惊讶的算法比其他近似方法更好地估计参数,并达到与计算成本更高的算法相当的性能水平。贝叶斯因子惊讶与香农惊讶有关但不同。在两个假设实验中,我们对能够将贝叶斯因子惊讶与香农惊讶区分开来的生理指标做出了可检验的预测。将各种方法视为基于惊讶的学习的理论见解,以及提出的在线算法,可应用于非平稳环境中动物和人类行为的分析以及强化学习。