School of Psychology, Georgia Institute of Technology, Atlanta, USA.
School of Economics, Georgia Institute of Technology, Atlanta, USA.
Sci Rep. 2022 Aug 17;12(1):13923. doi: 10.1038/s41598-022-18245-1.
Reinforcement learning (RL) models have been influential in characterizing human learning and decision making, but few studies apply them to characterizing human spatial navigation and even fewer systematically compare RL models under different navigation requirements. Because RL can characterize one's learning strategies quantitatively and in a continuous manner, and one's consistency of using such strategies, it can provide a novel and important perspective for understanding the marked individual differences in human navigation and disentangle navigation strategies from navigation performance. One-hundred and fourteen participants completed wayfinding tasks in a virtual environment where different phases manipulated navigation requirements. We compared performance of five RL models (3 model-free, 1 model-based and 1 "hybrid") at fitting navigation behaviors in different phases. Supporting implications from prior literature, the hybrid model provided the best fit regardless of navigation requirements, suggesting the majority of participants rely on a blend of model-free (route-following) and model-based (cognitive mapping) learning in such navigation scenarios. Furthermore, consistent with a key prediction, there was a correlation in the hybrid model between the weight on model-based learning (i.e., navigation strategy) and the navigator's exploration vs. exploitation tendency (i.e., consistency of using such navigation strategy), which was modulated by navigation task requirements. Together, we not only show how computational findings from RL align with the spatial navigation literature, but also reveal how the relationship between navigation strategy and a person's consistency using such strategies changes as navigation requirements change.
强化学习 (RL) 模型在刻画人类学习和决策方面具有重要影响,但很少有研究将其应用于刻画人类空间导航,更少有研究系统地比较不同导航要求下的 RL 模型。由于 RL 可以定量且连续地刻画一个人的学习策略及其使用这些策略的一致性,因此它可以为理解人类导航中的显著个体差异提供一个新颖而重要的视角,并将导航策略与导航性能区分开来。114 名参与者在虚拟环境中完成了寻路任务,其中不同阶段操纵了导航要求。我们比较了五个 RL 模型(3 个无模型、1 个基于模型和 1 个“混合”)在不同阶段拟合导航行为的性能。支持先前文献中的相关含义,混合模型无论导航要求如何,都提供了最佳拟合,这表明大多数参与者在这种导航场景中依赖于无模型(路线跟随)和基于模型(认知映射)学习的混合。此外,与一个关键预测一致,混合模型中基于模型的学习(即导航策略)的权重与导航者的探索与利用倾向(即使用此类导航策略的一致性)之间存在相关性,而这种相关性受到导航任务要求的调节。总之,我们不仅展示了 RL 的计算结果如何与空间导航文献相一致,还揭示了随着导航要求的变化,导航策略与一个人使用这些策略的一致性之间的关系如何变化。