Suppr
超能文献

“主动”利用线索-情境一致性构建强化学习的奖励函数。

'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

作者信息

Zsuga Judit, Biro Klara, Tajti Gabor, Szilasi Magdolna Emma, Papp Csaba, Juhasz Bela, Gesztelyi Rudolf

机构信息

Department of Health Systems Management and Quality Management for Health Care, Faculty of Public Health, University of Debrecen, Debrecen, Nagyerdei krt. 98, 4032, Hungary.

Department of Pharmacology, Faculty of Pharmacy, University of Debrecen, Debrecen, Nagyerdei krt. 98, 4032, Hungary.

出版信息

BMC Neurosci. 2016 Oct 28;17(1):70. doi: 10.1186/s12868-016-0302-7.

DOI:10.1186/s12868-016-0302-7

PMID:27793098

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5086043/

Abstract

BACKGROUND

Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability. The central objective of reinforcement learning is to solve these two functions outside the agent's control either using, or not using a model.

RESULTS

In the present paper, using the proactive model of reinforcement learning we offer insight on how the brain creates simplified representations of the environment, and how these representations are organized to support the identification of relevant stimuli and action. Furthermore, we identify neurobiological correlates of our model by suggesting that the reward and policy functions, attributes of the Bellman equitation, are built by the orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC), respectively.

CONCLUSIONS

Based on this we propose that the OFC assesses cue-context congruence to activate the most context frame. Furthermore given the bidirectional neuroanatomical link between the OFC and model-free structures, we suggest that model-based input is incorporated into the reward prediction error (RPE) signal, and conversely RPE signal may be used to update the reward-related information of context frames and the policy underlying action selection in the OFC and ACC, respectively. Furthermore clinical implications for cognitive behavioral interventions are discussed.

摘要

背景

强化学习是一种基本的学习形式，可以用贝尔曼方程进行形式化。因此，智能体将状态值确定为即时奖励与未来状态折扣值之和。这样，状态值就由与智能体相关的属性（动作集、策略、折扣因子）以及智能体对由奖励函数体现的环境和由转移概率给出的隐藏环境因素的了解所决定。强化学习的核心目标是在智能体的控制之外，使用或不使用模型来求解这两个函数。

结果

在本文中，我们使用强化学习的主动模型，深入探讨了大脑如何创建环境的简化表征，以及这些表征是如何组织起来以支持相关刺激和动作的识别。此外，我们通过表明奖励和策略函数（贝尔曼方程的属性）分别由眶额皮质（OFC）和前扣带回皮质（ACC）构建，确定了我们模型的神经生物学相关性。

结论

基于此，我们提出眶额皮质评估线索 - 情境一致性以激活最相关的情境框架。此外，鉴于眶额皮质与无模型结构之间的双向神经解剖学联系，我们认为基于模型的输入被纳入奖励预测误差（RPE）信号，反之，RPE信号可分别用于更新眶额皮质和前扣带回皮质中情境框架的奖励相关信息以及动作选择背后的策略。此外，还讨论了对认知行为干预的临床意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba06/5086043/283aa08a8463/12868_2016_302_Fig1_HTML.jpg

相似文献

'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

BMC Neurosci. 2016 Oct 28;17(1):70. doi: 10.1186/s12868-016-0302-7.

The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

Behav Neurosci. 2016 Feb;130(1):6-18. doi: 10.1037/bne0000116.

Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics.

Neural Netw. 2006 Oct;19(8):1233-41. doi: 10.1016/j.neunet.2006.05.039. Epub 2006 Sep 18.

Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning.

Neural Netw. 2006 Oct;19(8):1242-54. doi: 10.1016/j.neunet.2006.06.007. Epub 2006 Sep 20.

Deficient reinforcement learning in medial frontal cortex as a model of dopamine-related motivational deficits in ADHD.

Neural Netw. 2013 Oct;46:199-209. doi: 10.1016/j.neunet.2013.05.008. Epub 2013 May 21.

Reward-dependent learning in neuronal networks for planning and decision making.

Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.

Implicit motivational value and salience are processed in distinct areas of orbitofrontal cortex.

Neuroimage. 2012 Sep;62(3):1717-25. doi: 10.1016/j.neuroimage.2012.06.016. Epub 2012 Jun 19.

How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.

Reduced orbitofrontal-striatal activity on a reversal learning task in obsessive-compulsive disorder.

Arch Gen Psychiatry. 2006 Nov;63(11):1225-36. doi: 10.1001/archpsyc.63.11.1225.

Reinforcement learning in continuous time and space.

Neural Comput. 2000 Jan;12(1):219-45. doi: 10.1162/089976600300015961.

引用本文的文献

Mesocorticolimbic and Cardiometabolic Diseases-Two Faces of the Same Coin?

Int J Mol Sci. 2024 Sep 6;25(17):9682. doi: 10.3390/ijms25179682.

Evaluation of the Relationships between Irisin Levels and Cognitive Functions in Individuals with Schizophrenia.

Clin Psychopharmacol Neurosci. 2023 Nov 30;21(4):724-731. doi: 10.9758/cpn.22.1030. Epub 2023 Jun 29.

The ventrolateral prefrontal cortex is part of the modular working memory system: A functional neuroanatomical perspective.

Front Neuroanat. 2023 Feb 27;17:1076095. doi: 10.3389/fnana.2023.1076095. eCollection 2023.

Vestibular Stimulation May Drive Multisensory Processing: Principles for Targeted Sensorimotor Therapy (TSMT).

Brain Sci. 2021 Aug 23;11(8):1111. doi: 10.3390/brainsci11081111.

The Role of Irisin in Alzheimer's Disease.

J Clin Med. 2018 Nov 1;7(11):407. doi: 10.3390/jcm7110407.

Blind Spot for Sedentarism: Redefining the Diseasome of Physical Inactivity in View of Circadian System and the Irisin/BDNF Axis.

Front Neurol. 2018 Oct 1;9:818. doi: 10.3389/fneur.2018.00818. eCollection 2018.

Solving the Credit Assignment Problem With the Prefrontal Cortex.

Front Neurosci. 2018 Mar 27;12:182. doi: 10.3389/fnins.2018.00182. eCollection 2018.

The Lateral Prefrontal Cortex and Selection/Inhibition in ADHD.

Front Hum Neurosci. 2018 Feb 20;12:65. doi: 10.3389/fnhum.2018.00065. eCollection 2018.

The Alteration of Irisin-Brain-Derived Neurotrophic Factor Axis Parallels Severity of Distress Disorder in Bronchial Asthma Patients.

Front Neurosci. 2017 Nov 23;11:653. doi: 10.3389/fnins.2017.00653. eCollection 2017.

Alteration of the irisin-brain-derived neurotrophic factor axis contributes to disturbance of mood in COPD patients.

Int J Chron Obstruct Pulmon Dis. 2017 Jul 7;12:2023-2033. doi: 10.2147/COPD.S135701. eCollection 2017.

本文引用的文献

Influences of unconscious priming on voluntary actions: Role of the rostral cingulate zone.

Neuroimage. 2016 Jul 15;135:243-52. doi: 10.1016/j.neuroimage.2016.04.036. Epub 2016 Apr 30.

The "proactive" model of learning: Integrative framework for model-free and model-based reinforcement learning utilizing the associative learning-based proactive brain concept.

Behav Neurosci. 2016 Feb;130(1):6-18. doi: 10.1037/bne0000116.

Creative Cognition and Brain Network Dynamics.

Trends Cogn Sci. 2016 Feb;20(2):87-95. doi: 10.1016/j.tics.2015.10.004. Epub 2015 Nov 6.

Orbitofrontal cortex encodes memories within value-based schemas and represents contexts that guide memory retrieval.

J Neurosci. 2015 May 27;35(21):8333-44. doi: 10.1523/JNEUROSCI.0134-15.2015.

Identity-specific coding of future rewards in the human orbitofrontal cortex.

Proc Natl Acad Sci U S A. 2015 Apr 21;112(16):5195-200. doi: 10.1073/pnas.1503550112. Epub 2015 Apr 6.

Salience processing and insular cortical function and dysfunction.

Nat Rev Neurosci. 2015 Jan;16(1):55-61. doi: 10.1038/nrn3857. Epub 2014 Nov 19.

Modeling dopaminergic and other processes involved in learning from reward prediction error: contributions from an individual differences perspective.

Front Hum Neurosci. 2014 Sep 30;8:740. doi: 10.3389/fnhum.2014.00740. eCollection 2014.

The organization and dynamics of corticostriatal pathways link the medial orbitofrontal cortex to future behavioral responses.

J Neurophysiol. 2014 Nov 15;112(10):2457-69. doi: 10.1152/jn.00221.2014. Epub 2014 Aug 20.

Distinguishing informational from value-related encoding of rewarding and punishing outcomes in the human brain.

Eur J Neurosci. 2014 Jun;39(11):2014-26. doi: 10.1111/ejn.12625. Epub 2014 May 24.

Dopamine ups and downs in vulnerability to addictions: a neurodevelopmental model.

Trends Pharmacol Sci. 2014 Jun;35(6):268-76. doi: 10.1016/j.tips.2014.04.002. Epub 2014 Apr 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

“主动”利用线索-情境一致性构建强化学习的奖励函数。

'Proactive' use of cue-context congruence for building reinforcement learning's reward function.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译