• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

奖励最大化何时会导致匹配法则?

When does reward maximization lead to matching law?

作者信息

Sakai Yutaka, Fukai Tomoki

机构信息

Brain Science Institute, Tamagawa University, Machida, Tokyo, Japan.

出版信息

PLoS One. 2008;3(11):e3795. doi: 10.1371/journal.pone.0003795. Epub 2008 Nov 24.

DOI:10.1371/journal.pone.0003795
PMID:19030101
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2582656/
Abstract

What kind of strategies subjects follow in various behavioral circumstances has been a central issue in decision making. In particular, which behavioral strategy, maximizing or matching, is more fundamental to animal's decision behavior has been a matter of debate. Here, we prove that any algorithm to achieve the stationary condition for maximizing the average reward should lead to matching when it ignores the dependence of the expected outcome on subject's past choices. We may term this strategy of partial reward maximization "matching strategy". Then, this strategy is applied to the case where the subject's decision system updates the information for making a decision. Such information includes subject's past actions or sensory stimuli, and the internal storage of this information is often called "state variables". We demonstrate that the matching strategy provides an easy way to maximize reward when combined with the exploration of the state variables that correctly represent the crucial information for reward maximization. Our results reveal for the first time how a strategy to achieve matching behavior is beneficial to reward maximization, achieving a novel insight into the relationship between maximizing and matching.

摘要

在各种行为情境中,主体遵循何种策略一直是决策中的核心问题。特别是,哪种行为策略,即最大化策略还是匹配策略,对动物的决策行为更为根本,这一直是一个有争议的问题。在此,我们证明,任何旨在实现平均奖励最大化的平稳条件的算法,当它忽略预期结果对主体过去选择的依赖性时,都应导致匹配。我们可以将这种部分奖励最大化策略称为“匹配策略”。然后,将该策略应用于主体决策系统更新决策信息的情况。此类信息包括主体过去的行动或感官刺激,而这种信息的内部存储通常称为“状态变量”。我们证明,当与正确表示奖励最大化关键信息的状态变量探索相结合时,匹配策略提供了一种简单的奖励最大化方法。我们的结果首次揭示了实现匹配行为的策略如何有利于奖励最大化,为最大化与匹配之间的关系提供了新的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64ab/2582656/5c003cb25450/pone.0003795.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64ab/2582656/1087a1566249/pone.0003795.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64ab/2582656/f75a9c99a030/pone.0003795.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64ab/2582656/5c003cb25450/pone.0003795.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64ab/2582656/1087a1566249/pone.0003795.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64ab/2582656/f75a9c99a030/pone.0003795.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64ab/2582656/5c003cb25450/pone.0003795.g003.jpg

相似文献

1
When does reward maximization lead to matching law?奖励最大化何时会导致匹配法则?
PLoS One. 2008;3(11):e3795. doi: 10.1371/journal.pone.0003795. Epub 2008 Nov 24.
2
Optimal decision making and matching are tied through diminishing returns.最优决策和匹配是通过收益递减联系在一起的。
Proc Natl Acad Sci U S A. 2017 Aug 8;114(32):8499-8504. doi: 10.1073/pnas.1703440114. Epub 2017 Jul 24.
3
Operant matching as a Nash equilibrium of an intertemporal game.作为跨期博弈纳什均衡的操作性匹配
Neural Comput. 2009 Oct;21(10):2755-73. doi: 10.1162/neco.2009.09-08-854.
4
The actor-critic learning is behind the matching law: matching versus optimal behaviors.行动者-评论家学习是匹配法则背后的原理:匹配行为与最优行为。
Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.
5
Statistical mechanics of reward-modulated learning in decision-making networks.决策网络中受奖励调节的学习的统计力学。
Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.
6
Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.操作性匹配是基于奖励与神经活动之间的协方差的突触可塑性的一般结果。
Proc Natl Acad Sci U S A. 2006 Oct 10;103(41):15224-9. doi: 10.1073/pnas.0505220103. Epub 2006 Sep 28.
7
Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex.灵长类动物背外侧前额叶皮层中的强化学习与决策机制。
Ann N Y Acad Sci. 2007 May;1104:108-22. doi: 10.1196/annals.1390.007. Epub 2007 Mar 8.
8
Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex.与背外侧前额叶皮质中选择和结果相关的动态信号。
Cereb Cortex. 2007 Sep;17 Suppl 1:i110-7. doi: 10.1093/cercor/bhm064. Epub 2007 Jun 4.
9
Policy adjustment in a dynamic economic game.动态经济博弈中的政策调整。
PLoS One. 2006 Dec 20;1(1):e103. doi: 10.1371/journal.pone.0000103.
10
Reward-based training of recurrent neural networks for cognitive and value-based tasks.用于认知和基于价值任务的循环神经网络的基于奖励的训练。
Elife. 2017 Jan 13;6:e21492. doi: 10.7554/eLife.21492.

引用本文的文献

1
Undermatching Is a Consequence of Policy Compression.政策压缩导致不匹配。
J Neurosci. 2023 Jan 18;43(3):447-457. doi: 10.1523/JNEUROSCI.1003-22.2022. Epub 2022 Dec 6.
2
Dynamic decision making and value computations in medial frontal cortex.内侧前额叶皮层的动态决策和价值计算。
Int Rev Neurobiol. 2021;158:83-113. doi: 10.1016/bs.irn.2020.12.001. Epub 2021 Jan 23.
3
The Relevance of Operant Behavior in Conceptualizing the Psychological Well-Being of Captive Animals.操作性行为在圈养动物心理健康概念化中的相关性

本文引用的文献

1
The actor-critic learning is behind the matching law: matching versus optimal behaviors.行动者-评论家学习是匹配法则背后的原理:匹配行为与最优行为。
Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.
2
Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity.操作性匹配是基于奖励与神经活动之间的协方差的突触可塑性的一般结果。
Proc Natl Acad Sci U S A. 2006 Oct 10;103(41):15224-9. doi: 10.1073/pnas.0505220103. Epub 2006 Sep 28.
3
Computational algorithms and neuronal network models underlying decision processes.
Perspect Behav Sci. 2020 Aug 10;43(3):617-654. doi: 10.1007/s40614-020-00259-7. eCollection 2020 Sep.
4
Stable Representations of Decision Variables for Flexible Behavior.决策变量的稳定表示法,以实现灵活的行为。
Neuron. 2019 Sep 4;103(5):922-933.e7. doi: 10.1016/j.neuron.2019.06.001. Epub 2019 Jul 4.
5
A Free-Operant Reward-Tracking Paradigm to Study Neural Mechanisms and Neurochemical Modulation of Adaptive Behavior in Rats.一种自由操作奖励追踪范式,用于研究大鼠适应性行为的神经机制和神经化学调节。
Int J Mol Sci. 2019 Jun 25;20(12):3098. doi: 10.3390/ijms20123098.
6
Using psychophysics to ask if the brain samples or maximizes.运用心理物理学来探究大脑是进行抽样还是实现最大化。
J Vis. 2015 Mar 12;15(3):7. doi: 10.1167/15.3.7.
7
Bursts and heavy tails in temporal and sequential dynamics of foraging decisions.觅食决策的时间和顺序动态中的突发情况与重尾现象。
PLoS Comput Biol. 2014 Aug 14;10(8):e1003759. doi: 10.1371/journal.pcbi.1003759. eCollection 2014 Aug.
8
Bayesian deterministic decision making: a normative account of the operant matching law and heavy-tailed reward history dependency of choices.贝叶斯确定性决策:对操作性匹配律和选择中重尾奖励历史依赖性的规范解释。
Front Comput Neurosci. 2014 Mar 4;8:18. doi: 10.3389/fncom.2014.00018. eCollection 2014.
9
Dynamical regimes in neural network models of matching behavior.匹配行为的神经网络模型中的动力学状态。
Neural Comput. 2013 Dec;25(12):3093-112. doi: 10.1162/NECO_a_00522. Epub 2013 Sep 18.
10
Optimizing vs. matching: response strategy in a probabilistic learning task is associated with negative symptoms of schizophrenia.优化与匹配:概率学习任务中的反应策略与精神分裂症的阴性症状有关。
Schizophr Res. 2011 Apr;127(1-3):215-22. doi: 10.1016/j.schres.2010.12.003. Epub 2011 Jan 15.
决策过程背后的计算算法和神经网络模型。
Neural Netw. 2006 Oct;19(8):1091-105. doi: 10.1016/j.neunet.2006.05.034. Epub 2006 Aug 30.
4
Stimuli, reinforcers, and behavior: an integration.刺激、强化物与行为:一种整合
J Exp Anal Behav. 1999 May;71(3):439-82. doi: 10.1901/jeab.1999.71-439.
5
Sensitivity of time allocation to an overall reinforcer rate feedback function in concurrent interval schedules.在同时进行的时间间隔程序中,对整体强化物率反馈函数的时间分配敏感性。
J Exp Anal Behav. 1989 Mar;51(2):215-31. doi: 10.1901/jeab.1989.51-215.
6
Optimization and the matching law as accounts of instrumental behavior.优化与匹配律作为工具性行为的解释。
J Exp Anal Behav. 1981 Nov;36(3):387-403. doi: 10.1901/jeab.1981.36-387.
7
Melioration, matching, and maximization.改善、匹配和最大化。
J Exp Anal Behav. 1981 Sep;36(2):141-9. doi: 10.1901/jeab.1981.36-141.
8
Is matching compatible with reinforcement maximization on concurrent variable interval variable ratio?同时采用变化比例和变化间隔强化时,匹配是否与强化最大化兼容?
J Exp Anal Behav. 1979 Mar;31(2):209-23. doi: 10.1901/jeab.1979.31-209.
9
A Markov model description of changeover probabilities on concurrent variable-interval schedules.一种关于同时变时距程序中转换概率的马尔可夫模型描述。
J Exp Anal Behav. 1979 Jan;31(1):41-51. doi: 10.1901/jeab.1979.31-41.
10
A biophysically based neural model of matching law behavior: melioration by stochastic synapses.基于生物物理学的匹配律行为神经模型:随机突触导致的改善。
J Neurosci. 2006 Apr 5;26(14):3731-44. doi: 10.1523/JNEUROSCI.5159-05.2006.