Suppr超能文献

学习的“主动”模型:利用基于联想学习的主动大脑概念的无模型和基于模型的强化学习综合框架。

The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

作者信息

Zsuga Judit, Biro Klara, Papp Csaba, Tajti Gabor, Gesztelyi Rudolf

机构信息

Department of Health Systems Management and Quality Management for Health Care, Faculty of Public Health, University of Debrecen.

Department of Pharmacology, Faculty of Pharmacy, University of Debrecen.

出版信息

Behav Neurosci. 2016 Feb;130(1):6-18. doi: 10.1037/bne0000116.

Abstract

Reinforcement learning (RL) is a powerful concept underlying forms of associative learning governed by the use of a scalar reward signal, with learning taking place if expectations are violated. RL may be assessed using model-based and model-free approaches. Model-based reinforcement learning involves the amygdala, the hippocampus, and the orbitofrontal cortex (OFC). The model-free system involves the pedunculopontine-tegmental nucleus (PPTgN), the ventral tegmental area (VTA) and the ventral striatum (VS). Based on the functional connectivity of VS, model-free and model based RL systems center on the VS that by integrating model-free signals (received as reward prediction error) and model-based reward related input computes value. Using the concept of reinforcement learning agent we propose that the VS serves as the value function component of the RL agent. Regarding the model utilized for model-based computations we turned to the proactive brain concept, which offers an ubiquitous function for the default network based on its great functional overlap with contextual associative areas. Hence, by means of the default network the brain continuously organizes its environment into context frames enabling the formulation of analogy-based association that are turned into predictions of what to expect. The OFC integrates reward-related information into context frames upon computing reward expectation by compiling stimulus-reward and context-reward information offered by the amygdala and hippocampus, respectively. Furthermore we suggest that the integration of model-based expectations regarding reward into the value signal is further supported by the efferent of the OFC that reach structures canonical for model-free learning (e.g., the PPTgN, VTA, and VS).

摘要

强化学习(RL)是一种强大的概念,是由标量奖励信号控制的关联学习形式的基础,当期望被违反时就会发生学习。RL可以使用基于模型和无模型的方法进行评估。基于模型的强化学习涉及杏仁核、海马体和眶额皮质(OFC)。无模型系统涉及脚桥被盖核(PPTgN)、腹侧被盖区(VTA)和腹侧纹状体(VS)。基于VS的功能连接性,无模型和基于模型的RL系统以VS为中心,VS通过整合无模型信号(作为奖励预测误差接收)和基于模型的奖励相关输入来计算价值。使用强化学习智能体的概念,我们提出VS作为RL智能体的值函数组件。关于用于基于模型计算的模型,我们转向了主动脑概念,该概念基于其与上下文关联区域的巨大功能重叠,为默认网络提供了一种普遍存在的功能。因此,通过默认网络,大脑不断地将其环境组织成上下文框架,从而能够形成基于类比的关联,并将其转化为对预期内容的预测。OFC通过分别编译杏仁核和海马体提供的刺激-奖励和上下文-奖励信息,在计算奖励期望时将奖励相关信息整合到上下文框架中。此外,我们认为,OFC的传出纤维到达无模型学习的典型结构(如PPTgN、VTA和VS),进一步支持了将基于模型的奖励期望整合到价值信号中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验