Suppr超能文献

“主动”利用线索-情境一致性构建强化学习的奖励函数。

'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

作者信息

Zsuga Judit, Biro Klara, Tajti Gabor, Szilasi Magdolna Emma, Papp Csaba, Juhasz Bela, Gesztelyi Rudolf

机构信息

Department of Health Systems Management and Quality Management for Health Care, Faculty of Public Health, University of Debrecen, Debrecen, Nagyerdei krt. 98, 4032, Hungary.

Department of Pharmacology, Faculty of Pharmacy, University of Debrecen, Debrecen, Nagyerdei krt. 98, 4032, Hungary.

出版信息

BMC Neurosci. 2016 Oct 28;17(1):70. doi: 10.1186/s12868-016-0302-7.

Abstract

BACKGROUND

Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model.

RESULTS

In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively.

CONCLUSIONS

Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

摘要

背景

强化学习是一种基本的学习形式,可以用贝尔曼方程进行形式化。因此,智能体将状态值确定为即时奖励与未来状态折扣值之和。这样,状态值就由与智能体相关的属性(动作集、策略、折扣因子)以及智能体对由奖励函数体现的环境和由转移概率给出的隐藏环境因素的了解所决定。强化学习的核心目标是在智能体的控制之外,使用或不使用模型来求解这两个函数。

结果

在本文中,我们使用强化学习的主动模型,深入探讨了大脑如何创建环境的简化表征,以及这些表征是如何组织起来以支持相关刺激和动作的识别。此外,我们通过表明奖励和策略函数(贝尔曼方程的属性)分别由眶额皮质(OFC)和前扣带回皮质(ACC)构建,确定了我们模型的神经生物学相关性。

结论

基于此,我们提出眶额皮质评估线索 - 情境一致性以激活最相关的情境框架。此外,鉴于眶额皮质与无模型结构之间的双向神经解剖学联系,我们认为基于模型的输入被纳入奖励预测误差(RPE)信号,反之,RPE信号可分别用于更新眶额皮质和前扣带回皮质中情境框架的奖励相关信息以及动作选择背后的策略。此外,还讨论了对认知行为干预的临床意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba06/5086043/283aa08a8463/12868_2016_302_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验