Suppr超能文献

竞争游戏中随机行为的神经机制。

Neural mechanism for stochastic behaviour during a competitive game.

作者信息

Soltani Alireza, Lee Daeyeol, Wang Xiao-Jing

机构信息

Department of Physics and Volen Center for Complex Systems, Brandeis University, Waltham, MA 02454, USA.

出版信息

Neural Netw. 2006 Oct;19(8):1075-90. doi: 10.1016/j.neunet.2006.05.044.

Abstract

Previous studies have shown that non-human primates can generate highly stochastic choice behaviour, especially when this is required during a competitive interaction with another agent. To understand the neural mechanism of such dynamic choice behaviour, we propose a biologically plausible model of decision making endowed with synaptic plasticity that follows a reward-dependent stochastic Hebbian learning rule. This model constitutes a biophysical implementation of reinforcement learning, and it reproduces salient features of behavioural data from an experiment with monkeys playing a matching pennies game. Due to interaction with an opponent and learning dynamics, the model generates quasi-random behaviour robustly in spite of intrinsic biases. Furthermore, non-random choice behaviour can also emerge when the model plays against a non-interactive opponent, as observed in the monkey experiment. Finally, when combined with a meta-learning algorithm, our model accounts for the slow drift in the animal's strategy based on a process of reward maximization.

摘要

先前的研究表明,非人类灵长类动物能够产生高度随机的选择行为,尤其是在与另一个主体进行竞争性互动时需要这种行为的情况下。为了理解这种动态选择行为的神经机制,我们提出了一种具有生物学合理性的决策模型,该模型具有遵循奖励依赖型随机赫布学习规则的突触可塑性。这个模型构成了强化学习的生物物理实现,并且它再现了猴子玩匹配便士游戏实验中行为数据的显著特征。由于与对手的互动和学习动态,该模型尽管存在内在偏差,仍能稳健地产生准随机行为。此外,正如在猴子实验中观察到的那样,当该模型与非交互式对手对抗时,也会出现非随机选择行为。最后,当与元学习算法相结合时,我们的模型基于奖励最大化过程解释了动物策略中的缓慢漂移。

相似文献

2
3
Self-control with spiking and non-spiking neural networks playing games.通过脉冲神经网络和非脉冲神经网络进行游戏时的自我控制。
J Physiol Paris. 2010 May-Sep;104(3-4):108-17. doi: 10.1016/j.jphysparis.2009.11.013. Epub 2009 Nov 26.

引用本文的文献

4
Fast adaptation to rule switching using neuronal surprise.利用神经元惊讶实现快速规则切换适应。
PLoS Comput Biol. 2024 Feb 20;20(2):e1011839. doi: 10.1371/journal.pcbi.1011839. eCollection 2024 Feb.
5
Undermatching Is a Consequence of Policy Compression.政策压缩导致不匹配。
J Neurosci. 2023 Jan 18;43(3):447-457. doi: 10.1523/JNEUROSCI.1003-22.2022. Epub 2022 Dec 6.
7
Timescales of Cognition in the Brain.大脑认知的时间尺度
Curr Opin Behav Sci. 2021 Oct;41:30-37. doi: 10.1016/j.cobeha.2021.03.003. Epub 2021 Mar 31.
8
Reinforcement Learning during Adolescence in Rats.大鼠青春期的强化学习。
J Neurosci. 2020 Jul 22;40(30):5857-5870. doi: 10.1523/JNEUROSCI.0910-20.2020. Epub 2020 Jun 29.
9
A neuronal theory of sequential economic choice.一种关于序列经济选择的神经元理论。
Brain Neurosci Adv. 2018 Apr 13;2:2398212818766675. doi: 10.1177/2398212818766675. eCollection 2018 Jan-Dec.
10

本文引用的文献

2
Neural basis of quasi-rational decision making.准理性决策的神经基础。
Curr Opin Neurobiol. 2006 Apr;16(2):191-8. doi: 10.1016/j.conb.2006.02.001. Epub 2006 Mar 13.
3
Behavioral theories and the neurophysiology of reward.行为理论与奖赏的神经生理学
Annu Rev Psychol. 2006;57:87-115. doi: 10.1146/annurev.psych.56.091103.070229.
6
Learning and decision making in monkeys during a rock-paper-scissors game.猴子在玩剪刀石头布游戏时的学习与决策
Brain Res Cogn Brain Res. 2005 Oct;25(2):416-30. doi: 10.1016/j.cogbrainres.2005.07.003. Epub 2005 Aug 10.
7
Graded bidirectional synaptic plasticity is composed of switch-like unitary events.分级双向突触可塑性由类似开关的单一事件组成。
Proc Natl Acad Sci U S A. 2005 Jul 5;102(27):9679-84. doi: 10.1073/pnas.0502332102. Epub 2005 Jun 27.
9
Cascade models of synaptically stored memories.突触存储记忆的级联模型。
Neuron. 2005 Feb 17;45(4):599-611. doi: 10.1016/j.neuron.2005.02.001.
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验