Chen Wei James, Krajbich Ian
Department of Economics, The Ohio State University, Columbus, OH 43210.
Department of Economics, The Ohio State University, Columbus, OH 43210;
Proc Natl Acad Sci U S A. 2017 May 2;114(18):4637-4642. doi: 10.1073/pnas.1618161114. Epub 2017 Apr 17.
Models of reinforcement learning (RL) are prevalent in the decision-making literature, but not all behavior seems to conform to the gradual convergence that is a central feature of RL. In some cases learning seems to happen all at once. Limited prior research on these "epiphanies" has shown evidence of sudden changes in behavior, but it remains unclear how such epiphanies occur. We propose a sequential-sampling model of epiphany learning (EL) and test it using an eye-tracking experiment. In the experiment, subjects repeatedly play a strategic game that has an optimal strategy. Subjects can learn over time from feedback but are also allowed to commit to a strategy at any time, eliminating all other options and opportunities to learn. We find that the EL model is consistent with the choices, eye movements, and pupillary responses of subjects who commit to the optimal strategy (correct epiphany) but not always of those who commit to a suboptimal strategy or who do not commit at all. Our findings suggest that EL is driven by a latent evidence accumulation process that can be revealed with eye-tracking data.
强化学习(RL)模型在决策文献中很常见,但并非所有行为似乎都符合RL的核心特征——渐进收敛。在某些情况下,学习似乎是一下子就发生了。此前对这些“顿悟”的有限研究已显示出行为突然变化的证据,但顿悟是如何发生的仍不清楚。我们提出了一种顿悟学习(EL)的序列采样模型,并通过一项眼动追踪实验对其进行测试。在实验中,受试者反复玩一个具有最优策略的策略性游戏。受试者可以随着时间从反馈中学习,但也被允许随时选定一种策略,从而排除所有其他学习选项和机会。我们发现,EL模型与选定最优策略(正确顿悟)的受试者的选择、眼动和瞳孔反应一致,但并非总是与选定次优策略或根本未选定策略的受试者的情况一致。我们的研究结果表明,EL由一个潜在的证据积累过程驱动,该过程可通过眼动追踪数据揭示。