Suppr超能文献

基于选择主义和贝叶斯主义的适应性学习,第一部分:两者之间的联系。

Adaptive learning via selectionism and Bayesianism, Part I: connection between the two.

作者信息

Zhang Jun

机构信息

Department of Psychology, University of Michigan, 530 Church Street, Ann Arbor 48109-1043, USA.

出版信息

Neural Netw. 2009 Apr;22(3):220-8. doi: 10.1016/j.neunet.2009.03.018. Epub 2009 Apr 5.

Abstract

According to the selection-by-consequence characterization of operant learning, individual animals/species increase or decrease their future probability of action choices based on the consequence (i.e., reward or punishment) of the currently selected action (the so-called "Law of Effect"). Under Bayesianism, on the other hand, evidence is evaluated based on likelihood functions so that action probability is modified from a priori to a posteriori according to the Bayes formula. Viewed as hypothesis testing, a selectionist framework attributes evidence exclusively to the selected, focal hypothesis, whereas a Bayesian framework distributes across all hypotheses the support from a piece of evidence. Here, an intimate connection between the two theoretical frameworks is revealed. Specifically, it is proven that when individuals modify their action choices based on the selectionist's Law of Effect, the learning population, on the ensemble level, evolves according to a Bayesian-like dynamics. The learning equation of the linear operator model [Bush, R. R., & Mosteller, F. (1955). Stochastic models for learning, New York: John Wiley and Sons], under ensemble averaging, yields the class of predictive reinforcement learning models (e.g., [Busemeyer, J. R., & Myung, I. J. (1992). An adaptive approach to human decision making: Learning theory, decision theory, and human performance. Journal of Experimental Psychology: General, 121, 177-194; Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16, 1936-1947]).

摘要

根据操作性学习的结果选择特征,个体动物/物种会根据当前所选行动的后果(即奖励或惩罚)来增加或降低其未来行动选择的概率(即所谓的“效果律”)。另一方面,在贝叶斯主义下,证据是根据似然函数进行评估的,以便根据贝叶斯公式将行动概率从先验概率修改为后验概率。从假设检验的角度来看,选择主义框架将证据完全归因于所选的焦点假设,而贝叶斯框架则将一条证据的支持分配给所有假设。在此,揭示了这两个理论框架之间的紧密联系。具体而言,已证明当个体根据选择主义的效果律修改其行动选择时,学习群体在总体层面上会根据类似贝叶斯的动态进行演化。线性算子模型的学习方程[布什,R.R.,&莫斯特勒,F.(1955年)。学习的随机模型,纽约:约翰·威利父子公司],在总体平均下,产生了一类预测性强化学习模型(例如,[布塞梅尔,J.R.,&明,I.J.(1992年)。人类决策的一种自适应方法:学习理论、决策理论和人类表现。《实验心理学杂志:总论》,121,177 - 194;蒙塔古,P.R.,戴扬,P.,&塞乔夫斯基,T.J.(1996年)。基于预测性赫布学习的中脑多巴胺系统框架。《神经科学杂志》,16,1936 - 1947])。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验