Division of the Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125, USA.
J Neurosci. 2013 Jul 24;33(30):12519-27. doi: 10.1523/JNEUROSCI.1353-13.2013.
Flexible action selection requires knowledge about how alternative actions impact the environment: a "cognitive map" of instrumental contingencies. Reinforcement learning theories formalize this map as a set of stochastic relationships between actions and states, such that for any given action considered in a current state, a probability distribution is specified over possible outcome states. Here, we show that activity in the human inferior parietal lobule correlates with the divergence of such outcome distributions-a measure that reflects whether discrimination between alternative actions increases the controllability of the future-and, further, that this effect is dissociable from those of other information theoretic and motivational variables, such as outcome entropy, action values, and outcome utilities. Our results suggest that, although ultimately combined with reward estimates to generate action values, outcome probability distributions associated with alternative actions may be contrasted independently of valence computations, to narrow the scope of the action selection problem.
一种工具性条件作用的“认知地图”。强化学习理论将该地图形式化为动作和状态之间的一组随机关系,使得对于当前状态下考虑的任何给定动作,都可以在可能的结果状态上指定概率分布。在这里,我们表明,人类下顶叶皮层的活动与这种结果分布的发散相关——这一衡量标准反映了在替代动作之间进行区分是否增加了未来的可控性——此外,这种效应与其他信息论和动机变量(例如结果熵、动作值和结果效用)的效应可分离。我们的结果表明,尽管最终与奖励估计相结合以产生动作值,但与替代动作相关的结果概率分布可能会独立于价值计算进行对比,以缩小动作选择问题的范围。