Air Force Research Laboratory, AFB, OH, USA.
Cogn Sci. 2013 May-Jun;37(4):757-74. doi: 10.1111/cogs.12034. Epub 2013 Mar 29.
Reinforcement learning (RL) models of decision-making cannot account for human decisions in the absence of prior reward or punishment. We propose a mechanism for choosing among available options based on goal-option association strengths, where association strengths between objects represent previously experienced object proximity. The proposed mechanism, Goal-Proximity Decision-making (GPD), is implemented within the ACT-R cognitive framework. GPD is found to be more efficient than RL in three maze-navigation simulations. GPD advantages over RL seem to grow as task difficulty is increased. An experiment is presented where participants are asked to make choices in the absence of prior reward. GPD captures human performance in this experiment better than RL.
强化学习 (RL) 决策模型无法在没有事先奖励或惩罚的情况下解释人类的决策。我们提出了一种基于目标-选项关联强度在可用选项之间进行选择的机制,其中对象之间的关联强度表示先前经历过的对象接近度。所提出的机制,目标接近度决策 (GPD),是在 ACT-R 认知框架内实现的。在三个迷宫导航模拟中,发现 GPD 比 RL 更有效率。随着任务难度的增加,GPD 相对于 RL 的优势似乎会增加。提出了一个实验,要求参与者在没有事先奖励的情况下做出选择。GPD 比 RL 更好地捕捉到了实验中的人类表现。