Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
Department of Psychology, Penn State University, University Park, PA, 16801, USA.
Nat Commun. 2020 Oct 26;11(1):5407. doi: 10.1038/s41467-020-18864-0.
When making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Here we report that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation on a reinforcement learning task with a spatially structured reward function. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.
在做决策时,是应该利用已知的好选项,还是应该探索潜在的更好选择?对非结构化空间选项的探索依赖于大脑的新皮层、纹状体和杏仁核。然而,在自然环境中,更好的选择往往聚集在一起,形成结构化的价值分布。海马体将奖励信息绑定到以自我为中心的认知地图中,以支持在这种空间中的导航和觅食。在这里,我们报告说,在具有空间结构奖励功能的强化学习任务中,人类的后海马体(PH)激发了探索,而前海马体(AH)支持向利用的转变。这些动态取决于 PH 和 AH 中的差异强化表示。虽然局部奖励预测误差信号在 PH 尾部较早且呈阶段性,但全局价值最大值信号在 AH 体部延迟且持续。AH 跨情节压缩强化信息,更新价值最大值的位置和突出性,并在向其导航时显示类似目标细胞的斜升活动。