Institut des Systèmes Intelligents et de Robotique, Sorbonne Université, CNRS, F-75005, Paris, France.
CNRS, Institut de Neurosciences Cognitives et Intégratives d'Aquitaine (INCIA, UMR 5287), Bordeaux, France.
Sci Rep. 2019 May 1;9(1):6770. doi: 10.1038/s41598-019-43245-z.
In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. This study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies.
在一个奖励不确定的不稳定环境中,成功的表现需要在利用最佳选项和探索替代选择之间取得微妙的平衡。理论上提出,多巴胺有助于控制这种探索-利用权衡,具体来说,多巴胺的紧张水平越高,越有利于利用。我们在这里证明,在简单的非平稳多臂赌博任务中,多巴胺正奖励预测误差的缩放与探索-利用权衡之间存在正式关系。我们进一步在执行此类任务的大鼠中表明,系统拮抗多巴胺受体大大增加了随机选择的数量,而不会影响学习能力。对一组不同计算模型(扩展 Q 学习模型、定向探索模型和元学习模型)进行的模拟和比较,并对每个个体进行拟合,证实无论模型如何,降低多巴胺能活动都不会影响学习率,但相当于随机探索率的增加。这项研究表明,当面临不断变化的环境关联时,多巴胺可以调整决策中的探索-利用权衡。