在概率任务中，选择之间的时间流逝与重复相同的决策相关。

Time elapsed between choices in a probabilistic task correlates with repeating the same decision.

机构信息

Department of Molecular Neuropharmacology, Maj Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland.

Department of Structure Research of Condensed Matter, The Henryk Niewodniczański Institute of Nuclear Physics, Polish Academy of Sciences, Krakow, Poland.

出版信息

Eur J Neurosci. 2021 Apr;53(8):2639-2654. doi: 10.1111/ejn.15144. Epub 2021 Mar 2.

DOI:10.1111/ejn.15144

PMID:33559232

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8248175/

Abstract

Reinforcement learning causes an action that yields a positive outcome more likely to be taken in the future. Here, we investigate how the time elapsed from an action affects subsequent decisions. Groups of C57BL6/J mice were housed in IntelliCages with access to water and chow ad libitum; they also had access to bottles with a reward: saccharin solution, alcohol, or a mixture of the two. The probability of receiving a reward in two of the cage corners changed between 0.9 and 0.3 every 48 hr over a period of ~33 days. As expected, in most animals, the odds of repeating a corner choice were increased if that choice was previously rewarded. Interestingly, the time elapsed from the previous choice also influenced the probability of repeating the choice, and this effect was independent of previous outcome. Behavioral data were fitted to a series of reinforcement learning models. Best fits were achieved when the reward prediction update was coupled with separate learning rates from positive and negative outcomes and additionally a "fictitious" update of the expected value of the nonselected choice. Additional inclusion of a time-dependent decay of the expected values improved the fit marginally in some cases.

摘要

强化学习会导致未来更有可能采取产生积极结果的行动。在这里，我们研究了从一个动作到下一个动作的时间间隔是如何影响后续决策的。将 C57BL6/J 小鼠分组放入智能笼中，它们可以自由接触水和食物；它们还可以接触装有奖励的瓶子：糖精溶液、酒精或两者的混合物。在大约 33 天的时间里，两个笼子角落的奖励概率每 48 小时从 0.9 变为 0.3。正如预期的那样，在大多数动物中，如果之前的选择得到了奖励，那么重复选择角落的可能性就会增加。有趣的是，从上次选择到再次选择的时间间隔也会影响选择的概率，而这种影响与之前的结果无关。将行为数据拟合到一系列强化学习模型中。当奖励预测更新与来自正、负结果的单独学习率以及非选择选择的预期价值的“虚拟”更新相结合时，得到了最佳拟合。在某些情况下，额外包含预期值的时间相关衰减略微改善了拟合度。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在概率任务中，选择之间的时间流逝与重复相同的决策相关。

Time elapsed between choices in a probabilistic task correlates with repeating the same decision.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

在概率任务中，选择之间的时间流逝与重复相同的决策相关。

Time elapsed between choices in a probabilistic task correlates with repeating the same decision.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献