• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

贝叶斯确定性决策:对操作性匹配律和选择中重尾奖励历史依赖性的规范解释。

Bayesian deterministic decision making: a normative account of the operant matching law and heavy-tailed reward history dependency of choices.

机构信息

Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo Kashiwa, Japan.

Center for Evolutionary Cognitive Sciences, The University of Tokyo Tokyo, Japan ; RIKEN Brain Science Institute Wako, Japan ; Okanoya Emotional Information Project, Exploratory Research for Advanced Technology (ERATO), Japan Science and Technology Agency Wako, Japan.

出版信息

Front Comput Neurosci. 2014 Mar 4;8:18. doi: 10.3389/fncom.2014.00018. eCollection 2014.

DOI:10.3389/fncom.2014.00018
PMID:24624077
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3940885/
Abstract

The decision making behaviors of humans and animals adapt and then satisfy an "operant matching law" in certain type of tasks. This was first pointed out by Herrnstein in his foraging experiments on pigeons. The matching law has been one landmark for elucidating the underlying processes of decision making and its learning in the brain. An interesting question is whether decisions are made deterministically or probabilistically. Conventional learning models of the matching law are based on the latter idea; they assume that subjects learn choice probabilities of respective alternatives and decide stochastically with the probabilities. However, it is unknown whether the matching law can be accounted for by a deterministic strategy or not. To answer this question, we propose several deterministic Bayesian decision making models that have certain incorrect beliefs about an environment. We claim that a simple model produces behavior satisfying the matching law in static settings of a foraging task but not in dynamic settings. We found that the model that has a belief that the environment is volatile works well in the dynamic foraging task and exhibits undermatching, which is a slight deviation from the matching law observed in many experiments. This model also demonstrates the double-exponential reward history dependency of a choice and a heavier-tailed run-length distribution, as has recently been reported in experiments on monkeys.

摘要

人类和动物的决策行为会适应环境,并在某些类型的任务中满足“操作性匹配定律”。这是赫恩斯坦(Herrnstein)在对鸽子的觅食实验中首先指出的。匹配定律一直是阐明大脑中决策及其学习的基础过程的一个里程碑。一个有趣的问题是,决策是确定性的还是概率性的。传统的匹配定律学习模型基于后一种观点;它们假设主体学习各自替代方案的选择概率,并根据概率进行随机决策。然而,目前尚不清楚匹配定律是否可以通过确定性策略来解释。为了回答这个问题,我们提出了几个确定性贝叶斯决策模型,这些模型对环境有某些错误的信念。我们声称,一个简单的模型在觅食任务的静态环境中产生符合匹配定律的行为,但在动态环境中则不然。我们发现,具有环境不稳定信念的模型在动态觅食任务中表现良好,并表现出低估,这与许多实验中观察到的轻微偏离匹配定律的情况一致。该模型还展示了选择的双指数奖励历史依赖性和运行长度分布的更重尾,这与最近在猴子实验中报告的情况一致。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/d95fd9eb1812/fncom-08-00018-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/0370d595f2ca/fncom-08-00018-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/b13a6f375338/fncom-08-00018-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/a3a212ed7e57/fncom-08-00018-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/74d06c48b9f9/fncom-08-00018-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/7cc1c02d9b57/fncom-08-00018-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/d95fd9eb1812/fncom-08-00018-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/0370d595f2ca/fncom-08-00018-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/b13a6f375338/fncom-08-00018-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/a3a212ed7e57/fncom-08-00018-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/74d06c48b9f9/fncom-08-00018-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/7cc1c02d9b57/fncom-08-00018-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cd85/3940885/d95fd9eb1812/fncom-08-00018-g0006.jpg

相似文献

1
Bayesian deterministic decision making: a normative account of the operant matching law and heavy-tailed reward history dependency of choices.贝叶斯确定性决策:对操作性匹配律和选择中重尾奖励历史依赖性的规范解释。
Front Comput Neurosci. 2014 Mar 4;8:18. doi: 10.3389/fncom.2014.00018. eCollection 2014.
2
Reward expectations direct learning and drive operant matching in .奖赏预期指导学习并驱动操作性匹配。
Proc Natl Acad Sci U S A. 2023 Sep 26;120(39):e2221415120. doi: 10.1073/pnas.2221415120. Epub 2023 Sep 21.
3
The actor-critic learning is behind the matching law: matching versus optimal behaviors.行动者-评论家学习是匹配法则背后的原理:匹配行为与最优行为。
Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.
4
Statistical mechanics of reward-modulated learning in decision-making networks.决策网络中受奖励调节的学习的统计力学。
Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.
5
Matching Behavior as a Tradeoff Between Reward Maximization and Demands on Neural Computation.匹配行为作为奖励最大化与神经计算需求之间的权衡
F1000Res. 2015 Jun 9;4:147. doi: 10.12688/f1000research.6574.2. eCollection 2015.
6
Operant matching as a Nash equilibrium of an intertemporal game.作为跨期博弈纳什均衡的操作性匹配
Neural Comput. 2009 Oct;21(10):2755-73. doi: 10.1162/neco.2009.09-08-854.
7
Undermatching Is a Consequence of Policy Compression.政策压缩导致不匹配。
J Neurosci. 2023 Jan 18;43(3):447-457. doi: 10.1523/JNEUROSCI.1003-22.2022. Epub 2022 Dec 6.
8
Optimal decision making and matching are tied through diminishing returns.最优决策和匹配是通过收益递减联系在一起的。
Proc Natl Acad Sci U S A. 2017 Aug 8;114(32):8499-8504. doi: 10.1073/pnas.1703440114. Epub 2017 Jul 24.
9
A biophysically based neural model of matching law behavior: melioration by stochastic synapses.基于生物物理学的匹配律行为神经模型:随机突触导致的改善。
J Neurosci. 2006 Apr 5;26(14):3731-44. doi: 10.1523/JNEUROSCI.5159-05.2006.
10
Learning the opportunity cost of time in a patch-foraging task.在斑块觅食任务中了解时间的机会成本。
Cogn Affect Behav Neurosci. 2015 Dec;15(4):837-53. doi: 10.3758/s13415-015-0350-y.

引用本文的文献

1
Further examining how animals weigh conflicting information about reward sources over time.进一步研究动物如何随着时间推移权衡来自奖励源的相互冲突的信息。
Anim Cogn. 2025 Jul 30;28(1):74. doi: 10.1007/s10071-025-01982-x.
2
How do animals weigh conflicting information about reward sources over time? Comparing dynamic averaging models.动物如何随着时间的推移权衡关于奖励来源的冲突信息?比较动态平均模型。
Anim Cogn. 2024 Mar 2;27(1):11. doi: 10.1007/s10071-024-01840-2.
3
Undermatching Is a Consequence of Policy Compression.政策压缩导致不匹配。

本文引用的文献

1
Sequential effects: Superstition or rational behavior?序列效应:迷信还是理性行为?
Adv Neural Inf Process Syst. 2008;21:1873-1880.
2
Statistical mechanics of reward-modulated learning in decision-making networks.决策网络中受奖励调节的学习的统计力学。
Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.
3
A reservoir of time constants for memory traces in cortical neurons.皮质神经元记忆痕迹的时间常数库。
J Neurosci. 2023 Jan 18;43(3):447-457. doi: 10.1523/JNEUROSCI.1003-22.2022. Epub 2022 Dec 6.
4
Entropy-based metrics for predicting choice behavior based on local response to reward.基于熵的指标,用于预测基于奖励局部响应的选择行为。
Nat Commun. 2021 Nov 12;12(1):6567. doi: 10.1038/s41467-021-26784-w.
Nat Neurosci. 2011 Mar;14(3):366-72. doi: 10.1038/nn.2752. Epub 2011 Feb 13.
4
Explicit melioration by a neural diffusion model.神经扩散模型的显式改进。
Brain Res. 2009 Nov 24;1299:95-117. doi: 10.1016/j.brainres.2009.07.017. Epub 2009 Jul 30.
5
When does reward maximization lead to matching law?奖励最大化何时会导致匹配法则?
PLoS One. 2008;3(11):e3795. doi: 10.1371/journal.pone.0003795. Epub 2008 Nov 24.
6
Neurobiological models of two-choice decision making can be reduced to a one-dimensional nonlinear diffusion equation.双选决策的神经生物学模型可以简化为一维非线性扩散方程。
PLoS Comput Biol. 2008 Mar 28;4(3):e1000046. doi: 10.1371/journal.pcbi.1000046.
7
Robustness of learning that is based on covariance-driven synaptic plasticity.基于协方差驱动突触可塑性的学习的稳健性。
PLoS Comput Biol. 2008 Mar 7;4(3):e1000007. doi: 10.1371/journal.pcbi.1000007.
8
The actor-critic learning is behind the matching law: matching versus optimal behaviors.行动者-评论家学习是匹配法则背后的原理:匹配行为与最优行为。
Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.
9
Optimization and applications of echo state networks with leaky-integrator neurons.具有泄漏积分器神经元的回声状态网络的优化与应用
Neural Netw. 2007 Apr;20(3):335-52. doi: 10.1016/j.neunet.2007.04.016. Epub 2007 May 3.
10
A neural circuit model of flexible sensorimotor mapping: learning and forgetting on multiple timescales.一种灵活的感觉运动映射神经回路模型:多时间尺度上的学习与遗忘
Neuron. 2007 Apr 19;54(2):319-33. doi: 10.1016/j.neuron.2007.03.017.