School of Computing, University of Leeds Leeds, West Yorkshire, UK.
Neuroscience and Psychiatry Unit, University of Manchester Manchester, UK.
Front Neurosci. 2014 Feb 21;8:30. doi: 10.3389/fnins.2014.00030. eCollection 2014.
Computational models of learning have proved largely successful in characterizing potential mechanisms which allow humans to make decisions in uncertain and volatile contexts. We report here findings that extend existing knowledge and show that a modified reinforcement learning model, which has separate parameters according to whether the previous trial gave a reward or a punishment, can provide the best fit to human behavior in decision making under uncertainty. More specifically, we examined the fit of our modified reinforcement learning model to human behavioral data in a probabilistic two-alternative decision making task with rule reversals. Our results demonstrate that this model predicted human behavior better than a series of other models based on reinforcement learning or Bayesian reasoning. Unlike the Bayesian models, our modified reinforcement learning model does not include any representation of rule switches. When our task is considered purely as a machine learning task, to gain as many rewards as possible without trying to describe human behavior, the performance of modified reinforcement learning and Bayesian methods is similar. Others have used various computational models to describe human behavior in similar tasks, however, we are not aware of any who have compared Bayesian reasoning with reinforcement learning modified to differentiate rewards and punishments.
学习的计算模型在刻画潜在机制方面已被证明取得了很大的成功,这些机制使人类能够在不确定和不稳定的环境下做出决策。我们在此报告的研究结果扩展了现有知识,表明一个经过修改的强化学习模型,根据前一次试验是奖励还是惩罚,具有不同的参数,可以为不确定条件下的决策提供最佳的人类行为拟合。更具体地说,我们在具有规则反转的概率性二选一决策任务中,检验了我们的修改强化学习模型对人类行为数据的拟合程度。我们的结果表明,该模型比基于强化学习或贝叶斯推理的一系列其他模型更好地预测了人类行为。与贝叶斯模型不同,我们的修改强化学习模型不包括任何规则转换的表示。当我们的任务纯粹被视为机器学习任务,为了尽可能多地获得奖励而不试图描述人类行为时,修改强化学习和贝叶斯方法的性能相似。其他人使用各种计算模型来描述类似任务中的人类行为,但是,我们不知道有谁将贝叶斯推理与修改后的强化学习进行了比较,以区分奖励和惩罚。