Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892-4415.
Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892-4415
J Neurosci. 2024 Jun 12;44(24):e1873232024. doi: 10.1523/JNEUROSCI.1873-23.2024.
Reinforcement learning is a theoretical framework that describes how agents learn to select options that maximize rewards and minimize punishments over time. We often make choices, however, to obtain symbolic reinforcers (e.g., money, points) that are later exchanged for primary reinforcers (e.g., food, drink). Although symbolic reinforcers are ubiquitous in our daily lives, widely used in laboratory tasks because they can be motivating, mechanisms by which they become motivating are less understood. In the present study, we examined how monkeys learn to make choices that maximize fluid rewards through reinforcement with tokens. The question addressed here is how the value of a state, which is a function of multiple task features (e.g., the current number of accumulated tokens, choice options, task epoch, trials since the last delivery of primary reinforcer, etc.), drives value and affects motivation. We constructed a Markov decision process model that computes the value of task states given task features to then correlate with the motivational state of the animal. Fixation times, choice reaction times, and abort frequency were all significantly related to values of task states during the tokens task ( = 5 monkeys, three males and two females). Furthermore, the model makes predictions for how neural responses could change on a moment-by-moment basis relative to changes in the state value. Together, this task and model allow us to capture learning and behavior related to symbolic reinforcement.
强化学习是一种理论框架,用于描述代理如何随着时间的推移学习选择能够最大化奖励和最小化惩罚的选项。然而,我们经常做出选择,以获得符号强化物(例如,金钱,积分),然后将其交换为主要强化物(例如,食物,饮料)。尽管符号强化物在我们的日常生活中无处不在,并且在实验室任务中广泛使用,因为它们可以激发动机,但它们激发动机的机制却知之甚少。在本研究中,我们研究了猴子如何通过代币强化学习最大化液体奖励的选择。这里要解决的问题是,状态的价值(这是多个任务特征的函数,例如当前积累的代币数量,选择选项,任务时期,自上次提供主要强化物以来的试验次数等)如何驱动价值并影响动机。我们构建了一个马尔可夫决策过程模型,该模型计算给定任务特征的任务状态的价值,然后将其与动物的动机状态相关联。在代币任务期间,注视时间,选择反应时间和中止频率均与任务状态的价值显着相关(= 5 只猴子,3 只雄性和 2 只雌性)。此外,该模型可以预测神经反应如何随着状态值的变化而在瞬间发生变化。总的来说,这项任务和模型使我们能够捕捉与符号强化相关的学习和行为。