Graduate School of Brain Sciences, Tamagawa University, Tokyo, Japan.
Neural Netw. 2012 Nov;35:88-91. doi: 10.1016/j.neunet.2012.08.004. Epub 2012 Aug 24.
The impulsive preference of an animal for an immediate reward implies that it might subjectively discount the value of potential future outcomes. A theoretical framework to maximize the discounted subjective value has been established in the reinforcement learning theory. The framework has been successfully applied in engineering. However, this study identified a limitation when applied to animal behavior, where in some cases, there is no learning goal. Here a possible learning framework was proposed that is well-posed in any cases and that is consistent with the impulsive preference.
动物对即时奖励的冲动偏好意味着它可能会主观地低估潜在未来结果的价值。强化学习理论中已经建立了一个最大化折扣主观价值的理论框架。该框架已成功应用于工程领域。然而,当应用于动物行为时,该研究发现了一个限制,即某些情况下,没有学习目标。这里提出了一个可能的学习框架,它在任何情况下都是有解的,并且与冲动偏好一致。