• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

动作值估计中的不确定性会影响大鼠的动作选择和选择行为的学习率。

Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats.

机构信息

Neural Computation Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna-son, Kunigami, Okinawa 904-0412, Japan.

出版信息

Eur J Neurosci. 2012 Apr;35(7):1180-9. doi: 10.1111/j.1460-9568.2012.08025.x.

DOI:10.1111/j.1460-9568.2012.08025.x
PMID:22487046
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3380560/
Abstract

The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and learning rate. We designed a choice task in which rats selected either the left-poking or right-poking hole and received a reward of a food pellet stochastically. The reward probabilities of the left and right holes were chosen from six settings (high, 100% vs. 66%; mid, 66% vs. 33%; low, 33% vs. 0% for the left vs. right holes, and the opposites) in every 20-549 trials. We used Bayesian Q-learning models to estimate the time course of the probability distribution of action values and tested if they better explain the behaviors of rats than standard Q-learning models that estimate only the mean of action values. Model comparison by cross-validation revealed that a Bayesian Q-learning model with an asymmetric update for reward and non-reward outcomes fit the choice time course of the rats best. In the action-choice equation of the Bayesian Q-learning model, the estimated coefficient for the variance of action value was positive, meaning that rats were uncertainty seeking. Further analysis of the Bayesian Q-learning model suggested that the uncertainty facilitated the effective learning rate. These results suggest that the rats consider uncertainty in action-value estimation and that they have an uncertainty-seeking action policy and uncertainty-dependent modulation of the effective learning rate.

摘要

对候选动作的奖励结果进行估计对于决策至关重要。在这项研究中,我们研究了奖励结果估计中的不确定性是否以及如何影响动作选择和学习率。我们设计了一个选择任务,其中大鼠选择左戳或右戳孔,并随机获得食物丸作为奖励。左、右孔的奖励概率从六个设置(高,100% 对 66%;中,66% 对 33%;低,33% 对 0%,左对右,反之亦然)中每 20-549 次试验选择一次。我们使用贝叶斯 Q-学习模型来估计动作值概率分布的时间过程,并测试它们是否比仅估计动作值均值的标准 Q-学习模型更好地解释大鼠的行为。通过交叉验证的模型比较表明,对于奖励和非奖励结果具有不对称更新的贝叶斯 Q-学习模型最适合大鼠的选择时间过程。在贝叶斯 Q-学习模型的动作选择方程中,动作值方差的估计系数为正,这意味着大鼠具有寻求不确定性的倾向。对贝叶斯 Q-学习模型的进一步分析表明,不确定性促进了有效学习率的提高。这些结果表明,大鼠考虑了动作值估计中的不确定性,并且它们具有寻求不确定性的动作策略以及对有效学习率的不确定性依赖性调节。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/5278258bacfe/ejn0035-1180-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/9349aa66a427/ejn0035-1180-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/68af0317ac00/ejn0035-1180-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/42ab1d17d091/ejn0035-1180-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/23a79b168618/ejn0035-1180-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/c139658f7a39/ejn0035-1180-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/b83f9d8298ff/ejn0035-1180-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/5278258bacfe/ejn0035-1180-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/9349aa66a427/ejn0035-1180-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/68af0317ac00/ejn0035-1180-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/42ab1d17d091/ejn0035-1180-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/23a79b168618/ejn0035-1180-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/c139658f7a39/ejn0035-1180-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/b83f9d8298ff/ejn0035-1180-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f990/3380560/5278258bacfe/ejn0035-1180-f7.jpg

相似文献

1
Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats.动作值估计中的不确定性会影响大鼠的动作选择和选择行为的学习率。
Eur J Neurosci. 2012 Apr;35(7):1180-9. doi: 10.1111/j.1460-9568.2012.08025.x.
2
Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts of the striatum during fixed- and free-choice tasks.在固定选择和自由选择任务期间,纹状体背外侧、背内侧和腹侧部分存在不同的神经表征。
J Neurosci. 2015 Feb 25;35(8):3499-514. doi: 10.1523/JNEUROSCI.1962-14.2015.
3
Validation of decision-making models and analysis of decision variables in the rat basal ganglia.大鼠基底神经节决策模型的验证及决策变量分析
J Neurosci. 2009 Aug 5;29(31):9861-74. doi: 10.1523/JNEUROSCI.6157-08.2009.
4
[Mathematical models of decision making and learning].[决策与学习的数学模型]
Brain Nerve. 2008 Jul;60(7):791-8.
5
Metaplasticity as a Neural Substrate for Adaptive Learning and Choice under Uncertainty.作为不确定性下适应性学习与选择的神经基础的元可塑性
Neuron. 2017 Apr 19;94(2):401-414.e6. doi: 10.1016/j.neuron.2017.03.044.
6
The impact of learning on perceptual decisions and its implication for speed-accuracy tradeoffs.学习对知觉决策的影响及其对速度-准确性权衡的意义。
Nat Commun. 2020 Jun 2;11(1):2757. doi: 10.1038/s41467-020-16196-7.
7
Learning to Choose: Behavioral Dynamics Underlying the Initial Acquisition of Decision-Making.学习选择:决策制定初始获取过程中的行为动态。
eNeuro. 2024 May 17;11(5). doi: 10.1523/ENEURO.0142-24.2024. Print 2024 May.
8
Deconstructing the human algorithms for exploration.解构人类的探索算法。
Cognition. 2018 Apr;173:34-42. doi: 10.1016/j.cognition.2017.12.014. Epub 2017 Dec 29.
9
Effects of Ventral Striatum Lesions on Stimulus-Based versus Action-Based Reinforcement Learning.腹侧纹状体损伤对基于刺激与基于动作的强化学习的影响。
J Neurosci. 2017 Jul 19;37(29):6902-6914. doi: 10.1523/JNEUROSCI.0631-17.2017. Epub 2017 Jun 16.
10
Dynamic fluctuations in dopamine efflux in the prefrontal cortex and nucleus accumbens during risk-based decision making.风险决策过程中前额叶皮层和伏隔核多巴胺释放的动态波动。
J Neurosci. 2012 Nov 21;32(47):16880-91. doi: 10.1523/JNEUROSCI.3807-12.2012.

引用本文的文献

1
Chronic ethanol exposure produces sex-dependent impairments in value computations in the striatum.长期乙醇暴露会在纹状体的价值计算中产生性别依赖性损伤。
Sci Adv. 2025 Apr 4;11(14):eadt0200. doi: 10.1126/sciadv.adt0200. Epub 2025 Apr 2.
2
Global neural encoding of behavioral strategies in mice during perceptual decision-making task with two different sensory patterns.在具有两种不同感觉模式的感知决策任务中,小鼠行为策略的全局神经编码。
iScience. 2024 Oct 16;27(11):111182. doi: 10.1016/j.isci.2024.111182. eCollection 2024 Nov 15.
3
Dopamine transients encode reward prediction errors independent of learning rates.

本文引用的文献

1
A disinhibitory microcircuit for associative fear learning in the auditory cortex.听觉皮层中用于联想性恐惧学习的去抑制微电路。
Nature. 2011 Dec 7;480(7377):331-5. doi: 10.1038/nature10674.
2
Presynaptic gating of postsynaptically expressed plasticity at mature thalamocortical synapses.成熟的丘脑皮层突触后表达的可塑性的突触前门控。
J Neurosci. 2011 Nov 2;31(44):16012-25. doi: 10.1523/JNEUROSCI.3281-11.2011.
3
Deconstructing risk: separable encoding of variance and skewness in the brain.解构风险:大脑中方差和偏度的可分离编码。
多巴胺瞬变独立于学习率编码奖励预测误差。
Cell Rep. 2024 Oct 22;43(10):114840. doi: 10.1016/j.celrep.2024.114840. Epub 2024 Oct 11.
4
A mismatch between striatal cholinergic pauses and dopaminergic reward prediction errors.纹状体胆碱能停顿与多巴胺能奖励预测误差不匹配。
Proc Natl Acad Sci U S A. 2024 Oct 8;121(41):e2410828121. doi: 10.1073/pnas.2410828121. Epub 2024 Oct 4.
5
Localized and global representation of prior value, sensory evidence, and choice in male mouse cerebral cortex.雄性小鼠大脑皮层中先前价值、感官证据和选择的局部和全局表示。
Nat Commun. 2024 May 22;15(1):4071. doi: 10.1038/s41467-024-48338-6.
6
Dopamine transients encode reward prediction errors independent of learning rates.多巴胺瞬变编码奖励预测误差,与学习率无关。
bioRxiv. 2024 Aug 19:2024.04.18.590090. doi: 10.1101/2024.04.18.590090.
7
Chronic Ethanol Exposure Produces Persistent Impairment in Cognitive Flexibility and Decision Signals in the Striatum.长期乙醇暴露会导致纹状体认知灵活性和决策信号持续受损。
bioRxiv. 2025 Feb 25:2024.03.10.584332. doi: 10.1101/2024.03.10.584332.
8
Enhancing reinforcement learning models by including direct and indirect pathways improves performance on striatal dependent tasks.通过包括直接和间接途径来增强强化学习模型可以提高纹状体依赖任务的性能。
PLoS Comput Biol. 2023 Aug 18;19(8):e1011385. doi: 10.1371/journal.pcbi.1011385. eCollection 2023 Aug.
9
Computational models of adaptive behavior and prefrontal cortex.自适应行为和前额叶皮层的计算模型。
Neuropsychopharmacology. 2022 Jan;47(1):58-71. doi: 10.1038/s41386-021-01123-1. Epub 2021 Aug 13.
10
Integration of sensory evidence and reward expectation in mouse perceptual decision-making task with various sensory uncertainties.在具有各种感官不确定性的小鼠感知决策任务中,感官证据与奖励期望的整合。
iScience. 2021 Jul 9;24(8):102826. doi: 10.1016/j.isci.2021.102826. eCollection 2021 Aug 20.
Neuroimage. 2011 Oct 15;58(4):1139-49. doi: 10.1016/j.neuroimage.2011.06.087. Epub 2011 Jul 7.
4
Reconciling the influence of predictiveness and uncertainty on stimulus salience: a model of attention in associative learning.协调预测性和不确定性对刺激显著性的影响:联想学习中注意力的模型。
Proc Biol Sci. 2011 Sep 7;278(1718):2553-61. doi: 10.1098/rspb.2011.0836. Epub 2011 Jun 8.
5
Coding of reward risk by orbitofrontal neurons is mostly distinct from coding of reward value.眶额皮质神经元对奖赏风险的编码与对奖赏价值的编码大多是不同的。
Neuron. 2010 Nov 18;68(4):789-800. doi: 10.1016/j.neuron.2010.09.031.
6
Locus ceruleus and anterior cingulate cortex sustain wakefulness in a novel environment.蓝斑和扣带回前部皮质在新环境中维持觉醒。
J Neurosci. 2010 Oct 27;30(43):14543-51. doi: 10.1523/JNEUROSCI.3037-10.2010.
7
A behavioral and neural evaluation of prospective decision-making under risk.风险下前瞻性决策的行为和神经评估。
J Neurosci. 2010 Oct 27;30(43):14380-9. doi: 10.1523/JNEUROSCI.1459-10.2010.
8
States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.状态与奖励:基于模型和无模型强化学习的分离神经预测误差信号。
Neuron. 2010 May 27;66(4):585-95. doi: 10.1016/j.neuron.2010.04.016.
9
Neural correlates of value, risk, and risk aversion contributing to decision making under risk.价值、风险和风险厌恶的神经关联对风险决策的影响。
J Neurosci. 2009 Oct 7;29(40):12574-83. doi: 10.1523/JNEUROSCI.2614-09.2009.
10
Validation of decision-making models and analysis of decision variables in the rat basal ganglia.大鼠基底神经节决策模型的验证及决策变量分析
J Neurosci. 2009 Aug 5;29(31):9861-74. doi: 10.1523/JNEUROSCI.6157-08.2009.