Humphries Mark D, Khamassi Mehdi, Gurney Kevin
Group for Neural Theory, Department d'Etudes Cognitives, École Normale Supérieure Paris, France.
Front Neurosci. 2012 Feb 6;6:9. doi: 10.3389/fnins.2012.00009. eCollection 2012.
We continuously face the dilemma of choosing between actions that gather new information or actions that exploit existing knowledge. This "exploration-exploitation" trade-off depends on the environment: stability favors exploiting knowledge to maximize gains; volatility favors exploring new options and discovering new outcomes. Here we set out to reconcile recent evidence for dopamine's involvement in the exploration-exploitation trade-off with the existing evidence for basal ganglia control of action selection, by testing the hypothesis that tonic dopamine in the striatum, the basal ganglia's input nucleus, sets the current exploration-exploitation trade-off. We first advance the idea of interpreting the basal ganglia output as a probability distribution function for action selection. Using computational models of the full basal ganglia circuit, we showed that, under this interpretation, the actions of dopamine within the striatum change the basal ganglia's output to favor the level of exploration or exploitation encoded in the probability distribution. We also found that our models predict striatal dopamine controls the exploration-exploitation trade-off if we instead read-out the probability distribution from the target nuclei of the basal ganglia, where their inhibitory input shapes the cortical input to these nuclei. Finally, by integrating the basal ganglia within a reinforcement learning model, we showed how dopamine's effect on the exploration-exploitation trade-off could be measurable in a forced two-choice task. These simulations also showed how tonic dopamine can appear to affect learning while only directly altering the trade-off. Thus, our models support the hypothesis that changes in tonic dopamine within the striatum can alter the exploration-exploitation trade-off by modulating the output of the basal ganglia.
我们不断面临着在收集新信息的行动和利用现有知识的行动之间做出选择的困境。这种“探索 - 利用”的权衡取决于环境:稳定性有利于利用知识以实现收益最大化;波动性则有利于探索新选项并发现新结果。在此,我们着手通过检验纹状体(基底神经节的输入核团)中的紧张性多巴胺设定当前探索 - 利用权衡这一假设,来调和近期关于多巴胺参与探索 - 利用权衡的证据与基底神经节控制行动选择的现有证据。我们首先提出将基底神经节输出解释为行动选择的概率分布函数这一观点。使用完整基底神经节回路的计算模型,我们表明,在此解释下,纹状体内多巴胺的作用会改变基底神经节的输出,以有利于概率分布中编码的探索或利用水平。我们还发现,如果改为从基底神经节的目标核团读出概率分布,我们的模型预测纹状体多巴胺控制探索 - 利用权衡,在这些核团中,它们的抑制性输入塑造了对这些核团的皮质输入。最后,通过将基底神经节整合到强化学习模型中,我们展示了多巴胺对探索 - 利用权衡的影响如何在强制二选一任务中得以测量。这些模拟还展示了紧张性多巴胺如何看似影响学习,而实际上只是直接改变权衡。因此,我们的模型支持这样的假设,即纹状体内紧张性多巴胺的变化可通过调节基底神经节的输出改变探索 - 利用权衡。