Princeton Neuroscience Institute, Princeton University, Washington Road, Princeton, NJ 08544, USA.
Princeton Neuroscience Institute, Princeton University, Washington Road, Princeton, NJ 08544, USA.
Curr Biol. 2019 Jun 17;29(12):2066-2074.e5. doi: 10.1016/j.cub.2019.05.013. Epub 2019 May 30.
In 1979, Daniel Kahneman and Amos Tversky published a ground-breaking paper titled "Prospect Theory: An Analysis of Decision under Risk," which presented a behavioral economic theory that accounted for the ways in which humans deviate from economists' normative workhorse model, Expected Utility Theory [1, 2]. For example, people exhibit probability distortion (they overweight low probabilities), loss aversion (losses loom larger than gains), and reference dependence (outcomes are evaluated as gains or losses relative to an internal reference point). We found that rats exhibited many of these same biases, using a task in which rats chose between guaranteed and probabilistic rewards. However, prospect theory assumes stable preferences in the absence of learning, an assumption at odds with alternative frameworks such as animal learning theory and reinforcement learning [3-7]. Rats also exhibited trial history effects, consistent with ongoing learning. A reinforcement learning model in which state-action values were updated by the subjective value of outcomes according to prospect theory reproduced rats' nonlinear utility and probability weighting functions and also captured trial-by-trial learning dynamics.
1979 年,丹尼尔·卡尼曼(Daniel Kahneman)和阿莫斯·特沃斯基(Amos Tversky)发表了一篇具有开创性的论文,题为“前景理论:风险下的决策分析”[1,2],提出了一种行为经济学理论,该理论解释了人类偏离经济学家规范的基准模型——期望效用理论的方式。例如,人们表现出概率扭曲(他们高估低概率)、损失厌恶(损失比收益更明显)和参照依赖(结果相对于内部参照点被评估为收益或损失)。我们发现,大鼠在使用一种大鼠在保证奖励和概率奖励之间进行选择的任务中表现出了许多相同的偏差。然而,前景理论假设在没有学习的情况下偏好是稳定的,这一假设与动物学习理论和强化学习等替代框架不一致[3-7]。大鼠还表现出与持续学习一致的试验历史效应。根据前景理论,根据结果的主观价值更新状态-动作值的强化学习模型再现了大鼠的非线性效用和概率加权函数,并且还捕获了逐次学习的动态。